brunch

매거진 리테일사이언스

라이킷 13 댓글

You can make anything
by writing

C.S.Lewis

계정을 잊어버리셨나요?

by 닥터로 Mar 26. 2024

Farmgate Milk 가격 설명

오래전부터 생각해오던 일을 하나 하고 있는데,

막상 시작하니깐 너무 좋다.

그 일이 뭐냐면 "과거 정리"

1990년대

공부하던 시절 작성했던 Paper (대다수 Working Paper)와 노트 등을 정리하기 시작했다. 이미 컴퓨터 파일은 닳아서 없어졌는지 못찾고 있었는데, 찾자마자.. 내이름이 떡하니 있어서.. 이걸 내가 썻었네 하며 읽으니, 참 신기하다. 젊은 시절에 썻던 글을 다시 보다니.

한국 사람 이름이 보인다

Farmgate Milk (가공전 우유) 가격은 어떤 영향을 받을까? 박사과정 동료들하고 고민했던 흔적이 이 페이퍼에 녹아 있다. Jon은 지금 PSU 교수하고 Hema는 ADB에서 일하며 유명인들이 되었다. 물론 졸업뒤 연락이 끊겼지만도...

일단 나의 흔적을 찾은거 같아서 요새 삶에 더욱 활기가 난다.

논문요약은 (번역)

This paper seeks to explain the factors that are most responsible for movements in milk prices received by U.S. dairy farmers during the period of 1970 through 1994. Presented is an econometric analysis of time series data related to the dairy industry. The data series come from the U.S. Department of Agriculture. The organization of the paper is as follows. Section II provides a motivation for the analysis and discusses the data used. Section III presents a vector autoregression model fit to this data set. In section IV we estimate the cointegrating relationships in the data and fit a vector error correction model. Section V describes conclusions from this analysis.

이 논문은 1970년부터 1994년까지 미국 유제품 농가가 받는 우유 가격 변동에 가장 큰 영향을 미치는 요인을 설명한다. 미국 농무부로부터 얻은 유제품 산업 관련 시계열 데이터에 대한 계량경제학적 분석한다. 제2장에서는 분석의 동기를 제공하고 사용된 데이터에 대해 논의하고. 제3장에서는 이 데이터 세트에 적합한 벡터 자기회귀 모델을 제시한다. 제4장에서는 데이터의 공적분 관계를 추정하고 벡터 오류 수정 모델을 적용한다. 제5장에서는 이 분석으로부터 도출된 결론을 설명하며 끝낸다.

Agriculture and farming are a very important part of rural communities in the U.S. Dairy farming is the largest agricultural sector in most of the Northeastern and Northcentral states. As with almost every other industry, dairy farming has changed dramatically during the 20th century. The advancement of agricultural mechanization in the decades following World War II allowed the average size of family-operated farms to increase significantly. Improvements in scientific understanding of agricultural systems led to management changes on farms and genetic improvement of crops and livestock. Hence, agricultural yields increased greatly in the 1950's, 60's and 70's. For the most part, this period was financially very good for dairy farmers.

Unfortunately for dairy farmers, the nominal farmgate price of milk has been on a slight downward trend since 1980. Dairy farmers have virtually no influence over the prices they receive for milk, due to the competitive nature and structure of the industry. Because milk sales account for greater than 95% of the average dairy farm's revenue, no other factor is as important to viability of dairy farms and the structure of the industry.

미국 농촌에서 농업이랑 농장 일이 엄청 중요하다. 특히 북동부랑 북중부 주에서는 유제품 농업이 제일 큰 부분이다. 20세기 동안에는 다른 산업처럼 유제품 농업도 엄청 많이 변했다. 2차 세계대전 후 농업 기계화 발달로 가족 농장 크기가 많이 커졌고, 농업에 대한 과학적 이해가 농장 운영이나 작물, 가축 개선으로 이어졌다. 그래서 50년대, 60년대, 70년대에 농작물 수확량이 많이 늘었다. 그 시기는 유제품 농장에 돈벌이가 잘 됐던 시절이다.

근데 1980년부터 우유 가격이 조금씩 떨어지고 있어서 유제품 농가들에겐 안 좋은 소식이다. 산업 구조와 경쟁 때문에 농가들은 우유 값에 거의 영향을 못 미친다. 우유 판매가 유제품 농장 수입의 95% 이상을 차지하기 때문에, 유제품 농장이 잘 돌아가고 산업 구조가 유지되는 데 정말 중요한 부분이다.

~중략~

Johansen Cointegrating test

The Johansen test determines the number of cointegrating equations in the VAR model, referred to as the cointegrating rank of the model. If there are no cointegrating equations, then the (unrestricted) VAR model in first differences is appropriate. If there are n cointegrating equations in a system of n endogenous variables, then none of the series contain a unit root and the VAR can be estimated in levels of all the series.

Johansen 공적분 검정은 VAR 모델 내의 공적분 방정식 수를 결정한다. 이는 모델의 공적분 순위로 언급된다. 만약 공적분 방정식이 없다면, (제한이 없는) 첫 차분의 VAR 모델이 적절하다. 시스템 내 n개의 내생 변수가 있는 경우 n개의 공적분 방정식이 있다면, 그 어떤 시리즈도 단위근을 포함하지 않으며 모든 시리즈의 수준에서 VAR을 추정할 수 있다.

Vector Error Correction Model

A vector error correction model (VEC) imposes the cointegration restrictions on the VAR model. The original VAR model was a four variable system with three lagged differences. Three coefficient estimates on the cointegrating equation are significantly different from zero. This can be interpreted as further proof that the variables are cointegrated. If all the estimates were not significantly different from zero, then there would be no error correction term and the unrestricted VAR is appropriate. With regard to milk prices, we see that production and sales in the second period and stock of butter in the third period are significant and the signs are consistent with theory. Sales and stocks have the opposite sign with regard to milk production, but the coefficient estimates are not significant (except for sales in the third period). Milk price is significant only in the second period which implies production cannot respond instantaneously to prices changes. With the exception of production, the other variables do not affect the sales of milk. This result is contrary to intuition as we expect milk price to be an important variable in this equation. With the exception of its own coefficient, price, production and sales are not significant in the butter stock equation.

벡터 오류 수정 모델(VEC)은 VAR 모델에 공적분 제한을 부과한다. 원래 VAR 모델은 세 개의 늦춰진 차이를 가진 네 변수 시스템이었다. 공적분 방정식에 대한 세 개의 계수 추정치는 0과 유의하게 다르다. 이는 변수들이 공적분 관계에 있다는 추가적인 증거로 해석될 수 있다. 모든 추정치가 0과 유의하게 다르지 않다면, 오류 수정 항이 없고 제한이 없는 VAR이 적절하다.

우유 가격과 관련해서, 우리는 두 번째 기간의 생산과 판매 그리고 세 번째 기간의 버터 재고가 유의하며, 부호가 이론과 일관성이 있다는 것을 본다. 판매와 재고는 우유 생산과 반대의 부호를 가지지만, 계수 추정치는 유의하지 않다(세 번째 기간의 판매 제외). 우유 가격은 오직 두 번째 기간에서만 유의하며, 이는 생산이 가격 변화에 즉각적으로 반응할 수 없음을 의미한다. 생산을 제외하고, 다른 변수들은 우유의 판매에 영향을 주지 않는다. 이 결과는 우유 가격이 이 방정식에서 중요한 변수일 것으로 기대하는 직관과 반대다. 자신의 계수를 제외하고, 가격, 생산, 판매는 버터 재고 방정식에서 유의하지 않다.

Conclusion

This paper has presented an analysis of the factors affecting farmgate milk prices. Four endogenous variables were used in creating a set of simultaneous equations in a vector autoregressive framework. Cointegration was determined among the endogenous variables and an error correction term was used to create a vector error correction model.

The results of this model suggest that no single variable seems to be the key to explaining movements in farmgate milk prices. Its own lagged values have significant explanatory power for farmgate milk prices. A two period lag of milk production is shown to significantly influence milk prices, inversely; a two period lag of commercial milk disappearance influences milk prices positively; and a three period lag of butter stocks influences milk prices inversely. All of these results are consistent with economic theory and our prior expectations related to this analysis.

Because milk marketing in the U.S. is subject to convoluted federal policies, there may well be other important factors influencing the price of milk that are not adequately represented in this model. An alternative specification for further research might be to include information on dairy policies. However, the dairy industry is moving in the direction of unaltered free market forces and this analysis may help the industry to understand the market forces influencing farmgate milk prices.

이 논문은 Farmgate 우유 가격에 영향을 미치는 요인들에 대한 분석을 제시했다. 네 개의 내생 변수를 사용하여 벡터 자기회귀 프레임워크에서 동시 방정식 세트를 생성했다. 내생 변수들 사이에 공적분이 결정되었고, 벡터 오류 수정 모델을 생성하기 위해 오류 수정 항이 사용되었다.

이 모델의 결과는 단일 변수가 Farmgate 우유 가격 변동을 설명하는 핵심이라고 보기 어렵다는 것을 시사한다. 우유 가격의 자체 늦춰진 값은 Farmgate 우유 가격에 대해 유의미한 설명력을 가진다. 우유 생산의 두 기간 늦춰진 값은 우유 가격에 유의미한 역효과를 미치며; 상업 우유 소멸의 두 기간 늦춰진 값은 우유 가격에 긍정적으로 영향을 미친다; 그리고 버터 재고의 세 기간 늦춰진 값은 우유 가격에 역효과를 미친다. 이 모든 결과는 경제 이론과 이 분석에 대한 우리의 사전 기대와 일치한다.

미국에서의 우유 마케팅은 복잡한 연방 정책의 대상이기 때문에, 이 모델에서 충분히 대표되지 않은 우유 가격에 영향을 미치는 다른 중요한 요인들이 있을 수 있다. 추가 연구를 위한 대안적 명세는 유제품 정책에 대한 정보를 포함시키는 것일 수 있다. 하지만, 유제품 산업은 변함없는 자유 시장력의 방향으로 움직이고 있으며, 이 분석은 산업이 Farmgate 우유 가격에 영향을 미치는 시장력을 이해하는 데 도움이 될 수 있다.

---

아무것도 모르던 시절

다만, 계량경제학 전공해서 데이터만 보면 무조건 분석하던 시절이 기억난다. 시계열자료로 우유가격 변동에 미치는 시장력과 요인 분석 하며 원인을 찾아 의미 있는 결과를 내려고 땀 냈던 시절이 있었다는 조차 까먹었었는데. 이건 마치 어린시절 사진을 보는 것 보다. 이때 내가 무엇을 생각하고 있었고, 어디에 관심이 있었는지 보여주는거 같아서 좋다. 그런데, 문제는 한때 계량경제를 많이 알았지만, 지금 다시 보니..

동료들과 같이 작성한 어려운 페이퍼다.

당시 작성했던 코드를 Python으로 다시 작성

참고로.. 본 페이퍼 관련된 당시 작성했던 통계코드

import numpy as np

import pandas as pd

from scipy.stats import norm

# Load data

data_path = 'S:/RO/589/FINAL/589FP1.ASC'

data = pd.read_csv(data_path, header=None, sep='\s+')

Y = data.iloc[1:, 0].values

X = data.iloc[1:, 1:].values

# Initial values

alpha_hat = np.zeros((2, 1))

# Maximizing Logit Likelihood Function using Newton-Raphson

print("MAXIMIZING LOGIT LIKELIHOOD FUNCTION BY NEWTON-RAPHSON")

iterations = 1

while iterations < 100:

Z = X.reshape(3000, 2)

exp_Z_alpha = np.exp(Z @ alpha_hat)

reshaped_exp = exp_Z_alpha.reshape(1000, 3)

denom = reshaped_exp.sum(axis=1, keepdims=True)

P = reshaped_exp / denom

ln_P = np.log(P)

D = pd.get_dummies(Y).values

P_star = P.reshape(3000, 1)

PZ = P_star * Z

PZ_reshaped = PZ.reshape(1000, 6)

T_bar = PZ_reshaped[:, [0, 2, 4]].sum(axis=1)

C_bar = PZ_reshaped[:, [1, 3, 5]].sum(axis=1)

Z_bar = np.hstack((T_bar[:, None], C_bar[:, None]))

ZS = X - Z_bar

D_star = D.reshape(3000, 1)

LF = (D * ln_P).sum()

grad = (D_star * ZS).sum(axis=0)

SSQ = grad.T @ grad

if SSQ < .00001:

break

Hessian = (P_star * ZS).T @ ZS

cov = np.linalg.inv(-Hessian)

alpha_hat += cov @ grad

iterations += 1

# Compute T-stats

t_stats = alpha_hat / np.sqrt(np.diag(cov)[:, None])

# Format and print results

results = np.hstack((alpha_hat, t_stats))

results_df = pd.DataFrame(results, columns=["ALPHAHAT", "T-STAT"], index=["TIME", "COST"])

print(results_df)

# Turn off output to file

신기할 따름 ..

브런치는 최신 브라우저에 최적화 되어있습니다. IE chrome safari