brunch

You can make anything
by writing

C.S.Lewis

by 라인하트 Sep 28. 2020

앤드류 응의 머신러닝 (2-4):비용 함수 이해 2

온라인 강의 플랫폼 코세라의 창립자인 앤드류 응 (Andrew Ng) 교수는 인공지능 업계의 거장입니다. 그가 스탠퍼드 대학에서 머신 러닝 입문자에게 한 강의를 그대로 코세라 온라인 강의 (Coursera.org)에서 무료로 배울 수 있습니다. 이 강의는 머신러닝 입문자들의 필수코스입니다. 인공지능과 머신러닝을 혼자 공부하면서 자연스럽게 만나게 되는 강의입니다.

Linear Regression with One Variable

단변수 선형 회귀

Model and Cost Function (모델과 비용 함수)

Cost Function Intuition (비용 함수 이해 2)

In this video, lets delve deeper and get even better intuition about what the cost function is doing. This video assumes that you're familiar with contour plots. If you are not familiar with contour plots or contour figures some of the illustrations in this video may or may not make sense to you but is okay and if you end up skipping this video or some of it does not quite make sense because you haven't seen contour plots before. That's okay and you will still understand the rest of this course without those parts of this.

이번 강의에서 비용 함수의 역할에 대해 더 깊이 알아봅니다. 이번 강의는 등고선 그래프를 활용할 것이고, 익숙하지 않다면 건너뛰어도 좋습니다. 지난 강의와 같은 내용이지만 등고선 그래프를 이용하는 것이 다릅니다. 이번 강의는 등고선 그래프를 활용해 비용 함수의 역할을 좀 더 깊게 이해하는 것입니다.

Here's our problem formulation as usual, with the hypothesis parameters, cost function, and our optimization objective. Unlike before, unlike the last video, I'm going to keep both of my parameters, theta zero, and theta one, as we generate our visualizations for the cost function.

여기 여러분이 익숙한 수학 공식이 있습니다. 가설, 파라미터, 비용 함수, 그리고 최적화 목표입니다.

지난 강의에서는 파라미터 θ1만 사용했지만, 이번 강의에서는 비용 함수를 시각화하기 위해 파라미터 θ0과 θ1을 사용합니다.

So, same as last time, we want to understand the hypothesis h and the cost function J. So, here's my training set of housing prices and let's make some hypothesis. You know, like that one, this is not a particularly good hypothesis. But, if I set theta zero=50 and theta one=0.06, then I end up with this hypothesis down here and that corresponds to that straight line. Now given these value of theta zero and theta one, we want to plot the corresponding, you know, cost function on the right. What we did last time was, right, when we only had theta one. In other words, drawing plots that look like this as a function of theta one. But now we have two parameters, theta zero, and theta one, and so the plot gets a little more complicated. It turns out that when we have only one parameter, that the parts we drew had this sort of bow shaped function.

지난 강의에서 우리는 가설 함수 hθ(x)와 비용 함수 J를 배웠습니다. 여기 주택 크기에 따른 주택 가격을 나타내는 학습 데이터 셋이 있습니다. 우선, 가설을 설정합니다. 여기 일차함수의 검은색 직선은 좋은 가설이 아닙니다. 검은색 직선은 'θ0 = 50'이고 'θ1= 0.06'으로 가설 'hθ(x) = 50 + 0.06x'입니다. 파라미터 θ0와 θ1의 값을 기준으로 오른쪽에 비용 함수를 그립니다. 지난 강의에서 비용 함수를 파라미터 θ1 만으로 그렸고 모양은 타원형의 파란색 곡선이었습니다. 여기서는 2개의 파라미터 θ0와 θ1을 사용하고, 비용 함수 그래프는 더 복잡합니다. 파라미터가 1개일 때 비용 함수 그래프는 활 모양입니다.

And, in fact, depending on your training set, you might get a cost function that maybe looks something like this. So, this is a 3-D surface plot, where the axes are labeled theta zero and theta one. So as you vary theta zero and theta one, the two parameters, you get different values of the cost function J (theta zero, theta one) and the height of this surface above a

particular point of theta zero, theta one. Right, that's, that's the vertical axis. The height of the surface of the points indicates the value of J of theta zero, J of theta one. And you can see it sort of has this bowl like shape.

여기 2개의 파라미터 θ0와 θ1을 사용하는 비용 함수 그래프가 있습니다. 이것은 3차원의 표면 그래프입니다. 축의 라벨은 파라미터 θ0과 θ1입니다. 2개의 파라미터 θ0과 θ1이 다르면 다른 값의 비용 함수 J(θ0, 1θ)을 가집니다. J(θ0, θ1)는 수직축으로 바닥에서부터 표면의 점 (θ0, θ1)까지의 높이입니다. 이 그림은 그릇 모양입니다.

Let me show you the same plot in 3D. So here's the same figure in 3D, horizontal axis theta one and vertical axis J(theta zero, theta one), and if I rotate this plot around. You kinda of a get a sense, I hope, of this bowl shaped surface as that's what the cost function J looks like.

같은 그림을 3D 도면에서 보여드리겠습니다. 여기에 3D 도면이 있고, 수평축은 파라미터 θ0와 θ1이고, 수직축은 J(θ0, θ1)입니다. 3D 도면을 움직이면 그래프의 모양이 바뀝니다. 비용 함수에 대한 감각을 느끼길 바랍니다. 오목한 형태의 표면이 있고, 이것이 비용 함수 J(θ0, θ1)의 모양입니다.

Now for the purpose of illustration in the rest of this video I'm not actually going to use these sort of 3D surfaces to show you the cost function J, instead I'm going to use contour plots. Or what I also call contour figures. I guess they mean the same thing. To show you these surfaces. So here's an example of a contour figure, shown on the right, where the axis are theta zero and theta one. And what each of these ovals, what each of these ellipsis shows is a set of points that takes on the same value for J(theta zero, theta one). So concretely, for example this, you'll take that point and that point and that point. All three of these points that I just drew in magenta, they have the same value for J (theta zero, theta one). Okay. Where, right, these, this is the theta zero, theta one axis but those three have the same Value for J (theta zero, theta one)

나머지 강의에서 비용 함수 J를 이런 3D 표면으로 계속 사용하지 않을 것입니다. 대신에 등고선 그래프 또는 등고선 그림을 사용할 것입니다. 3D 표면보다 오른쪽에 있는 등고선 그래프가 더 널리 쓰입니다. 수평축은 파라미터 θ0, 수직축은 파라미터 θ1입니다. 그리고 각각의 타원들은 J(θ0, θ1)의 값입니다. 예를 들면, 같은 연한 녹색 타원 위의 분홍색 세 점은 비용 함수 J(θ0, θ1)의 값이 같습니다.

And if you haven't seen contour plots much before think of, imagine if you will. A bowl shaped function that's coming out of my screen. So that the minimum, so the bottom of the bow is this point right there, right? This middle, the middle of these concentric ellipses. And imagine a bowl shape that sort of grows out of my screen like this, so that each of these ellipses, you know, has the same height above my screen. And the minimum with the bowl, right, is right down there. And so the contour figures is a, is way to, is maybe a more convenient way to visualize my function J.

만약 전에 등고선 그래프를 본 적이 없다면 그릇 모양의 그래프가 스크린 밖으로 나와 있다고 상상합니다. 그릇의 가장 밑이 타원의 중심이자 최소값입니다. 각각의 타원들은 같은 높이이자 같은 J(θ0, θ1)의 값입니다. 중고등학교 때 사회과부도에서 산을 등고선으로 나타낸 것을 이해하면 쉽습니다. 지도에서 등고선 또는 타원의 중심은 산의 정상이자 최대 높이지만, 비용 함수의 그래프에서는 최소값이자 그릇의 바닥입니다. 등고선 그래프는 비용 함수 J를 시각화할 수 있는 손쉬운 방법입니다.

So, let's look at some examples. Over here, I have a particular point, right? And so this is, with, you know, theta zero equals maybe about 800, and theta one equals maybe a -0.15. And so this point, right, this point in red corresponds to one set of pair values of theta zero, theta one and the corresponding, in fact, to that hypothesis, right, theta zero is about 800, that is, where it intersects the vertical axis is around 800, and this is slope of about -0.15. Now this line is really not such a good fit to the data, right. This hypothesis, h(x), with these values of theta zero, theta one, it's really not such a good fit to the data. And so you find that, it's cost. Is a value that's out here that's you know pretty far from the minimum right it's pretty far this is a pretty high cost because this is just not that good a fit to the data.

그래서 다른 예를 봅시다. 오른쪽 등고선 그래프의 빨간 엑스 표시를 봅시다. 이 점은 θ0 값이 800이고, θ1 값은 약 -0.15입니다. 파라미터 θ0와 θ1의 쌍을 나타낸 것입니다. 가설에서 θ0의 값은 θ1이 0일 때 수직축 θ0를 지나는 값이므로 수직축의 800 값을 찍습니다. 그리고 경사 또는 기울기를 나타내는 파라미터 θ1은 -0.15입니다. 따라서, 가설은 'hθ(x) = 800 -0.5x'이고 파란색 직선과 일치합니다. 하지만, 파란색 직선은 데이터 셋에 전혀 적합하지 않습니다. 등고선 그래프에서는 타원의 중심인 최소값과 상당히 멀리 떨어져 있기 때문에 데이터 셋과의 오차도 매우 큽니다. 최소값과 꽤 멀리 떨어져 있습니다. 다시 말해서, 이 점이 최소값으로 부터 멀리 떨어져 있는 이유는 학습 데이터 셋과 전혀 비슷하지 않기 때문입니다.

Let's look at some more examples. Now here's a different hypothesis that's you know still not a great fit for the data but may be slightly better so here right that's my point that those are my parameters theta zero theta one and so my theta zero value. Right? That's about 360 and my value for theta one is equal to zero. So, you know, let's break it out. Let's take theta zero equals 360 theta one equals zero. And this pair of parameters corresponds to that

hypothesis, corresponds to flat line, that is, h(x) equals 360 plus zero times x. So that's the hypothesis. And this hypothesis again has some cost, and that cost is, you know, plotted as the height of the J function at that point.

여기에 또 다른 가설이 있습니다. 왼쪽의 파란색 직선은 데이터 셋에 적합하지 않지만, 바로 전의 가설보다는 훨씬 더 나은 형태입니다. 파란색 직선은 파라미터 θ0는 360이고, 파라미터 θ1는 0입니다. 따라서, 가설 'hθ(x) = 360 +0x'입니다. 가설은 파란색 직선처럼 수평의 일직선입니다. 오른쪽 등고선 그래프에서 J(360, 0)의 값은 빨간색 엑스 표시입니다. 비용 함수 J의 높이입니다. 전 예제보다 타원이 중심에 더 가까워졌습니다.

Let's look at just a couple of examples. Here's one more, you know, at this value

of theta zero, and at that value of theta one, we end up with this hypothesis, h(x)

and again, not a great fit to the data, and is actually further away from the minimum.

몇 개의 예를 봅시다. 오른쪽 등고선 그래프를 보면, 파라미터 θ0 = 500이고, 파라미터 θ1 =- 0.05입니다. 가설 'hθ(x) =. 00 - 0.05x'입니다. 왼쪽 그림의 파란색 직선은 좋은 가설이 아닙니다. 타원의 최소값에서 많이 떨어져 있습니다.

Last example, this is actually not quite at the minimum, but it's pretty close to the minimum. So this is not such a bad fit to the, to the data, where, for a particular value, of, theta zero. Which, one of them has value, as in for a particular value for theta one. We

get a particular h(x). And this is, this is not quite at the minimum, but it's pretty close. And so the sum of squares errors is sum of squares distances between my, training samples and my hypothesis. Really, that's a sum of square distances, right? Of all of these errors. This is pretty close to the minimum even though it's not quite the minimum. So with these figures I hope that gives you a better understanding of what values of the cost function J, how they are and how that corresponds to different hypothesis and so as how better hypotheses may corresponds to points that are closer to the minimum of this cost function J.

마지막 예는 최소값에서 조금 떨어진 값이지만, 데이터 셋의 모양과 유사합니다. 파라미터 θ0 =250이고, 파라미터 θ1 = 0.15입니다. 가설 'h(x) = 250 + 0.15x'입니다. 최소값은 아닙지만 꽤 비슷한 값입니다. 오차의 제곱의 합은 학습 데이터와 가설의 차이의 제곱의 합입니다. 실제로 오차는 가설을 나타내는 그래프와 데이터 간의 거리입니다. 결국, 더 좋은 가설은 비용 함수 J의 최소값과 더 가까울수록 더 좋은 가설입니다.

Now of course what we really want is an efficient algorithm, right, a efficient piece of software for automatically finding The value of theta zero and theta one, that minimizes the

cost function J, right? And what we, what we don't wanna do is to, you know, how to write software, to plot out this point, and then try to manually read off the numbers, that this is not a good way to do it. And, in fact, we'll see it later, that when we look at more complicated examples, we'll have high dimensional figures with more parameters, that, it turns out, we'll see in a few, we'll see later in this course, examples where this figure, you know, cannot really be plotted, and this becomes much harder to visualize. And so, what we want is to have software to find the value of theta zero, theta one that minimizes this function and in the next video we start to talk about an algorithm for automatically finding that value of theta zero and theta one that minimizes the cost function J.

이번 강의에서 더 효과적인 알고리즘이 어떤 것인지에 대해 배웠습니다. 자동적으로 파라미터 θ0와 θ1와 비용 함수 J의 최소값을 구하는 법도 배웠습니다. 여기서 가장 귀찮은 작업은 등고선 그래프에서 점을 찍고 수작업으로 숫자를 읽는 것입니다. 좋은 방법은 아닙니다. 사실 나중에 더 많은 파라미터를 가진 고차원 가설을 다룰 것입니다. 파라미터가 많아질수록 그래프로 나타내거나 시가화하기 어렵습니다. 결국 도식화하여 파라미터 θ0와 θ1와 비용 함수 J의 최소값을 찾을 수는 없습니다. 다음 강의에서 자동적으로 파라미터 θ0와 θ1 그리고 비용 함수 J의 최소값을 찾는 알고리즘을 공부할 것입니다.

앤드류 응의 머신러닝 동영상 강의

정리하며

비용 함수가 하나의 파라미터를 가진 J(θ1) 일 때는 단 하나의 이차함수의 그래프 모양입니다.

그러나, 비용 함수가 두 개의 파라미터를 가진 J(θ0,θ1) 일 때는 3D 입체 모양의 그래프입니다. 이차원 평면의 가로와 세로는 θ0과 θ1이고, 높이는 J(θ0,θ1)입니다. 'θ0 = 0'으로 고정시키고 θ1의 값이 변할 때마다 만들어지는 활 모양과 반대로 'θ0 = 1'으로 고정시키고 θ1이 변할 때마다 만들어지는 활 모양을 상상하세요. 그러면 이런 그릇 또는 그물망 모양을 상상할 수 있습니다.