brunch

You can make anything
by writing

C.S.Lewis

by 라인하트 Sep 27. 2020

앤드류 응의 머신러닝 (2-3): 비용 함수의 이해 1

온라인 강의 플랫폼 코세라의 창립자인 앤드류 응 (Andrew Ng) 교수는 인공지능 업계의 거장입니다. 그가 스탠퍼드 대학에서 머신 러닝 입문자에게 한 강의를 그대로 코세라 온라인 강의 (Coursera.org)에서 무료로 배울 수 있습니다. 이 강의는 머신러닝 입문자들의 필수코스입니다. 인공지능과 머신러닝을 혼자 공부하면서 자연스럽게 만나게 되는 강의입니다.

Linear Regression with One Variable

단변수 선형 회귀

Model and Cost Function (모델과 비용 함수)

Cost Function Intuition (비용 함수 이해)

In the previous video, we gave the mathematical definition of the cost function. In this video, let's look at some examples, to get back to intuition about what the cost function is doing, and why we want to use it.

지난 강의에서 비용 함수의 수학적 정의를 배웠습니다. 이번 강의에서 몇몇 사례를 살펴보고, 비용 함수의 역할과 비용 함수를 사용하는 이유를 설명합니다.

To recap, here's what we had last time. We want to fit a straight line to our data, so we had this formed as a hypothesis with these parameters theta zero and theta one, and with different choices of the parameters we end up with different straight line fits. So the data which are fit like so, and there's a cost function, and that was our optimization objective.

지난 강의에서 배운 것들을 요약합니다. 학습 데이터 셋에 적합한 직선을 구하기 위해 파라미터 θ0과 θ1을 가진 가설 함수의 공식을 배웠습니다. 파라미터 값이 바뀌면 직선의 모양도 달라집니다. 여기 비용 함수와 최적화 목표가 있습니다.

So this video, in order to better visualize the cost function J, I'm going to work with a simplified hypothesis function, like that shown on the right. So I'm gonna use my simplified hypothesis, which is just theta one times X. We can, if you want, think of this as setting the parameter theta zero equal to 0. So I have only one parameter theta one and my cost function is similar to before except that now H of X that is now equal to just theta one times X. And I have only one parameter theta one and so my optimization objective is to minimize j of theta one. In pictures what this means is that if theta zero equals zero that corresponds to choosing only hypothesis functions that pass through the origin, that pass through the point (0, 0). Using this simplified definition of a hypothesis and cost function. Let's try to understand the cost function concept better.

이번 강의에서 비용 함수 J를 더 잘 이해하기 위해 간소화된 가설 함수 hθ를 사용합니다. 'hθ(x) = θ1x '으로 θ0의 값이 0이고 θ1만 있습니다. 최적화된 목표는 J(θ1)의 최소값입니다. θ0가 0이므로 가설 함수 hθ(x)는 반드시 원점(0, 0) 점을 지나는 직선입니다.

여기 간소화된 가설과 비용 함수가 있습니다. 비용 함수의 개념을 더 잘 이해할 수 있습니다.

It turns out that two key functions we want to understand. The first is the hypothesis function, and the second is a cost function. So, notice that the hypothesis, right, H of X. For a face value of theta one, this is a function of X. So the hypothesis is a function of, what is the size of the house X. In contrast, the cost function, J, that's a function of the parameter, theta one, which controls the slope of the straight line. Let's plot these functions and try to understand them both better. Let's start with the hypothesis. On the left, let's say here's my training set with three points at (1, 1), (2, 2), and (3, 3).

두 개의 핵심 함수를 이해해야 합니다. 첫 번째는 가설 함수이고, 두 번째는 비용 함수입니다. 가설 함수 hθ(x)는 다음과 같습니다.

여기서, 가설 hθ(x)는 주택 크기 x에 대한 함수입니다. 이와 대조적으로 비용 함수 J(θ1)는 직선의 기울기를 결정하는 파라미터 θ1의 함수입니다. 더 잘 이해하기 위해 두 함수들을 왼쪽 도표에 도식화합니다. 왼쪽 도표에 학습 데이터 셋 (1, 1), (2, 2), (3, 3) 세 점이 있습니다.

Let's pick a value theta one, so when theta one equals one, and if that's my choice for theta one, then my hypothesis is going to look like this straight line over here. And I'm gonna point out, when I'm plotting my hypothesis function. X-axis, my horizontal axis is labeled X, is labeled you know, size of the house over here. Now, of temporary, set theta one equals one, what I want to do is figure out what is j of theta one, when theta one equals one. So let's go ahead and compute what the cost function has for. You'll devalue one. Well, as usual, my cost function is defined as follows, right? Some from, some of 'em are training sets of this usual squared error term. And, this is therefore equal to. And this. Of theta one x I minus y I and if you simplify this turns out to be. That. Zero Squared to zero squared to zero squared which is of course, just equal to zero. Now, inside the cost function. It turns out each of these terms here is equal to zero. Because for the specific training set I have or my 3 training examples are (1, 1), (2, 2), (3,3). If theta one is equal to one. Then h of x. H of x i. Is equal to y I exactly, let me write this better. Right? And so, h of x minus y, each of these terms is equal to zero, which is why I find that j of one is equal to zero.

θ1의 값을 1이라고 가정합니다. 가설 hθ(x) = x 이므로 검은색 직선으로 그립니다. 왼쪽 그림의 수평축 x는 주택 크기입니다. 이제 'θ1 = 1'이고 가설 hθ(x) = x 일 때 J(θ1) 값을 계산합니다. 비용 함수 J(θ1)을 계산합니다. 비용 함수를 다음과 같이 표현합니다.

여기서, 두 비용 함수 J(θ1)는 같습니다. hθ(x^(i)) = θ1x^(i) 이기 때문입니다. 이제 파라미터 θ1 = 1일 때 비용 함수 J(θ1)의 값을 계산합니다. 우선, 각 학습 데이터를 오차의 제곱식 '(θ1x^(i) - y^(i))^2'에 대입합니다. 가설 hθ(x^(i)) = θ1x^(i) = x^(i)입니다.

첫 번째 학습 데이터 (1,1)을 오차의 제곱식에 적용하면 (1-1)^2 = 0입니다.

두 번째 학습 데이터 (2,2)를 오차의 제곱식에 적용하면 (2-2)^2 = 0입니다.

세 번째 학습 데이터 (3,3)을 오차의 제곱식에 적용하면 (3-3)^2 = 0입니다.

세 학습 데이터를 합산하면 '0 + 0 + 0 = 0'입니다. J(θ1) = 1/(2*3) * 0 = 0입니다. 따라서, 비용 함수 J(θ1) = 0입니다.

So, we now know that j of one Is equal to zero. Let's plot that. What I'm gonna do on the right is plot my cost function J. And notice, because my cost function is a function of my parameter theta one, when I plot my cost function, the horizontal axis is now labeled with theta one. So I have j of one zero zero so let's go ahead and plot that. End up with. An X over there.

그래서 θ1 = 1일 때, J(θ1) = J(1) = 0입니다. 비용 함수 J(1)을 오른쪽에 그립니다. 비용 함수는 파라미터 θ1의 함수이기 때문에 수평축은 θ1입니다. J(1) = 0 이므로 수평축 θ1 = 1인점에 엑스 표시(X)를 하면 됩니다. 점의 위치는 (1,0)입니다.

Now lets look at some other examples. Theta 1 can take on a range of different values. Right? So theta-1 can take on the negative values, zero, positive values. So what if theta 1 is equal to 0.5. What happens then? Let's go ahead and plot that. I'm now going to set theta 1 equals 0.5, and in that case my hypothesis now looks like this. As a line with slope equals to 0.5, and, lets compute J, of 0.5. So that is going to be one over 2M of, my usual cost function. It turns out that the cost function is going to be the sum of square values of the height of this line. Plus the sum of square of the height of that line, plus the sum of square of the height of that line, right?? Cause just this vertical distance, that's the difference between, you know, Y. I. and the predicted value, H of XI, right? So the first example is going to be 0.5 minus one squared. Because my hypothesis predicted 0.5. Whereas, the actual value was one. For my

second example, I get, one minus two squared, because my hypothesis predicted one, but the actual housing price was two. And then finally, plus. 1.5 minus three squared. And so that's equal to one over two times three. Because, M when trading set size, right, have three training examples. In that, that's times simplifying for the parentheses it's 3.5. So that's 3.5 over six which is about 0.68. So now we know that j of 0.5 is about 0.68. Lets go and plot that. Oh excuse me, math error, it's actually 0.58. So we plot that which is maybe about over there. Okay?

다른 예도 봅시다. θ1은 음수 값, 0 또는 양수 값일 수 있습니다. 여기서 'θ1 = 0.5'라고 가정합니다. 왼쪽 그림에 그립니다. 'θ1 = 0.5'이므로 가설 hθ(x) = 0.5x이고 검은색 직선으로 그립니다. 파라미터 θ1은 직선의 기울기를 나타내므로 기울기가 0.5입니다. 비용 함수 J(0.5)는 다음과 같습니다.

첫 번째 학습 데이터 (1,1)을 오차의 제곱식에 적용하면 (0.5*1-1)^2 = (-0.5)^2 = 0.25입니다.

두 번째 학습 데이터 (2,2)를 오차의 제곱식에 적용하면 (0.5*2-2)^2 = (1-2)^2 = 1입니다.

세 번째 학습 데이터 (3,3)을 오차의 제곱식에 적용하면 (0.5*3-3)^2 = (1.5-3)^2 = 2.25입니다.

세 학습 데이터를 합산하면 '0.25 + 1 + 2.25 = 3.5'입니다. J(0.5) = 1/(2*3) * 3.5 = 0.58입니다. 각 훈련용 데이터 셋의 오차는 수평축과 수직으로 그어진 파란색 선분입니다. 오차의 제곱은 파란색 선분의 제곱이고, 오차의 합은 3개의 파란색 선분의 제곱의 합입니다. 왜냐하면 이 수직의 선분은 예측치 hΘ(x^(i))와 실제 값 y^(i)의 차이이기 때문입니다. 파란색 선분의 제곱의 평균의 절반인 J(0.5) = 0.58입니다. 오른쪽 그림에 수평축 θ1은 0.5인 곳이고 수직축은 0.58입니다. 점의 위치는 (0.5, 0.58)입니다.

Now, let's do one more. How about if theta one is equal to zero, what is J of zero equal to? It turns out that if theta one is equal to zero, then H of X is just equal to, you know, this flat line, right, that just goes horizontally like this. And so, measuring the errors. We have that J of zero is equal to one over two M, times one squared plus two squared plus three squared, which is, One six times fourteen which is about 2.3. So let's go ahead and plot as well. So it ends up with a value around 2.3 and of course we can keep on doing this for other values of theta one.

하나 더 해 봅시다. 'θ1 = 0'이라고 가정합니다. 가설 hθ(x) = 0입니다. x축과 동일한 납작한 수평선입니다. 파라미터 θ1은 직선의 기울기를 나타내므로 기울기가 0입니다. 비용 함수 J(0.5)는 다음과 같습니다.

첫 번째 학습 데이터 (1,1)을 오차의 제곱식에 적용하면 (0*1-1)^2 = (-1)^2 = 1입니다.

두 번째 학습 데이터 (2,2)를 오차의 제곱식에 적용하면 (0*2-2)^2 = (-2)^2 = 4입니다.

세 번째 학습 데이터 (3,3)을 오차의 제곱식에 적용하면 (0*3-3)^2 = (-3)^2 = 9입니다.

세 학습 데이터를 합산하면 '1 + 4 + 9 = 14'입니다. J(0) = 1/(2*3) * 14 = 2.3입니다. J(0) = 2.3입니다. 오른쪽 그림에 그립니다. 점의 위치는 (0, 2.3)입니다.

It turns out that you can have you know negative values of theta one as well so if theta one is negative then h of x would be equal to say minus 0.5 times x then theta one is minus 0.5 and so that corresponds to a hypothesis with a slope of negative 0.5. And you can actually keep on computing these errors. This turns out to be, you know, for 0.5, it turns out to have really high error. It works out to be something, like, 5.25.

θ1의 값이 음수인 -0.5 일 때 hθ(x) = -0.5*x입니다. 파라미터 θ1는 직선의 기울기를 나타내므로 기울기는 -0.5입니다. 지금처럼 오차의 제곱의 합을 계산합니다. J(-0.5) = 5.25입니다. 매우 큰 값으로 오른쪽 그림에 그립니다. 점의 위치는 (-0.5, 5.25)입니다.

And so on, and the different values of theta one, you can compute these things, right? And it turns out that you, your computed range of values, you get something like that. And by computing the range of values, you can actually slowly create out. What does function J of Theta say and that's what J of Theta is. To recap, for each value of theta one, right? Each value of theta one corresponds to a different hypothesis, or to a different straight line fit on the left. And for each value of theta one, we could then derive a different value of j of theta one. And for example, you know, theta one=1, corresponded to this straight line straight through the data. Whereas theta one=0.5. And this point shown in magenta corresponded to maybe that line, and theta one=zero which is shown in blue that corresponds to this horizontal line. Right, so for each value of theta one we wound up with a different value of J of theta one and we could then use this to trace out this plot on the right.

θ1의 값이 다른 범위에 있을 때 J(θ1)을 계산할 수 있습니다. 더 많은 J(θ1)의 값을 계산하면, 실제로 오른쪽 그림과 같은 그래프를 그릴 수 있습니다. 요약하자면, 'θ1 = 1'일 때 데이터는 hθ(x) = x 하늘색 직선과 일치하고 J(1)은 0입니다. 'θ1 = 0.5'일 때 데이터는 hθ(x) = 0.5x 분홍색 직선과 일치하고 J(0.5)는 2.3입니다. 'θ1 = 0'일 때 데이터는 hθ(x) = 0 수평선과 일치하고 J(0)는 5.8입니다. 이렇게 비용 함수의 그래프를 그릴 수 있습니다.

Now you remember, the optimization objective for our learning algorithm is we want to choose the value of theta one that minimizes J of theta one. Right? This was our objective function for the linear regression. Well, looking at this curve, the value that minimizes j of theta one is, you know, theta one equals to one. And low and behold, that is indeed the best possible straight line fit through our data, by setting theta one equals one. And just, for this particular training set, we actually end up fitting it perfectly. And that's why minimizing j of theta one corresponds to finding a straight line that fits the data well.

학습 알고리즘의 최적화 목표는 J(θ1)의 값을 최소화하는 θ1의 값을 선택하는 것입니다. 이것은 선형 회귀의 목적 함수입니다.

minimize

θ0, θ1

오른쪽 그림의 곡선을 보시면, J(θ1)의 최소값은 θ1이 1일 때입니다. 당연히 'θ1 = 1'일 때 데이터에 가장 적합하고 데이터와 완벽하게 일치합니다. 그리고 이것이 J(θ1)의 최소값이 데이터에 가장 잘 맞는 직선인 이유입니다.

So, to wrap up. In this video, we looked up some plots. To understand the cost function. To do so, we simplify the algorithm. So that it only had one parameter theta one. And we set the parameter theta zero to be only zero. In the next video. We'll go back to the original problem formulation and look at some visualizations involving both theta zero and theta one. That is without setting theta zero to zero. And hopefully that will give you, an even better sense of what the cost function j is doing in the original linear regression formulation.

정리하면, 이번 강의에서 몇 개의 그래프를 배웠습니다. 비용 함수 J를 더 잘 이해하기 위해 알고리즘을 간소화했습니다. θ0를 0으로 두고 파라미터 θ1만을 사용했습니다. 다음 강의에서 우리는 다시 수학 공식으로 돌아가서 파라미터 θ0과 θ1을 모두 사용하는 비용 함수의 시각적 형태를 볼 것입니다. 'θ0 = 0'로 설정하지 않을 것입니다. 선형 회귀 공식을 배우고 비용 함수 J를 더 이해할 수 있을 것입니다.

앤드류 응의 머신러닝 동영상 강의

정리하며

비용 함수를 더 잘 이해하기 위해 간소화된 가설 함수를 사용합니다.

(1) 간소화된 가설 함수 hθ(x)

θ0를 0으로 두고 파라미터 θ1만을 사용하여 가설 함수를 정의합니다.

(2) 비용 함수 J(θ1)

비용 함수 J(θ1)는 다음과 같습니다. 여기서 hθ(x^(i)) = θ1x^(i)이기 때문에 J(θ1)의 값을 최소화하는 파라미터 θ0와 θ1을 구할 수 있습니다.

(3) 비용 함수 J(θ1) 그래프

비용 함수 J(θ1)을 도식화 화면 타원형 모양의 이차함수입니다. 학습 알고리즘의 최적화 목표는 J(θ1)의 값을 최소화하는 θ1의 값을 선택하는 것입니다. 이것은 선형 회귀의 목적 함수입니다. 여기서 θ1 = 1 일 때 J(θ1)은 가장 작은 최소값을 가집니다. 즉, 학습 데이터셋과 예측값과의 오차가 가장 작습니다.