brunch

You can make anything
by writing

C.S.Lewis

by 라인하트 Oct 18. 2020

앤드류 응의 머신러닝 (6-3) : 결정 경계

온라인 강의 플랫폼 코세라의 창립자인 앤드류 응 (Andrew Ng) 교수는 인공지능 업계의 거장입니다. 그가 스탠퍼드 대학에서 머신 러닝 입문자에게 한 강의를 그대로 코세라 온라인 강의 (Coursera.org)에서 무료로 배울 수 있습니다. 이 강의는 머신러닝 입문자들의 필수코스입니다. 인공지능과 머신러닝을 혼자 공부하면서 자연스럽게 만나게 되는 강의입니다.

Logistic Regression

로지스틱 회귀

Classification and Representation (분류와 표현)

Decision Boundary (결정 경계)

In the last video, we talked about the hypothesis representation for logistic regression.

What I'd like to do now is tell you about something called the decision boundary, and this will give us a better sense of what the logistic regressions hypothesis function is computing.

지난 강의에서 로지스틱 회귀에 대한 가설 표현에 대해 배웠습니다. 이번 강의에서 결정 경계 (Decision boundary)가 무엇인지와 로지스틱 회귀 가설 함수가 무엇을 계산하는 지를 설명할 것입니다.

To recap, this is what we wrote out last time, where we said that the hypothesis is represented as h of x equals g of theta transpose x, where g is this function called the sigmoid function, which looks like this. It slowly increases from zero to one, asymptoting at one. What I want to do now is try to understand better when this hypothesis will make predictions that y is equal to 1 versus when it might make predictions that y is equal to 0.

로지스틱 함수 또는 시그모이드 함수에 대해 간단히 정리합니다.

g(z)는 시그모이드 함수입니다. 시그모이드 함수는 0에서 1까지 천천히 증가하다가 1 근처에서 평평하게 수렴합니다. 로지스틱 회귀 가설 함수는 'y = 1' 또는 'y = 0'을 예측합니다.

And understand better what hypothesis function looks like particularly when we have more than one feature. Concretely, this hypothesis is outputting estimates of the probability that y is equal to one, given x and parameterized by theta. So if we wanted to predict is y equal to one or is y equal to zero, here's something we might do. Whenever the hypothesis outputs that the probability of y equal one is greater than or equal to 0.5, so this means that if there is more likely to be y equals 1 than y equals 0, then let's predict y equals 1. And otherwise, if the probability, the estimated probability of y equal 1 is less than 0.5, then let's predict y equals 0. And I chose a greater than or equal to here and less than here If h of x is equal to 0.5 exactly, then you could predict positive or negative, but I probably created a loophole here, so we default maybe to predicting positive if h of x is 0.5, but that's a detail that really doesn't matter that much. What I want to do is understand better when is it exactly that h of x will be greater than or equal to 0.5, so that we'll end up predicting y is equal to 1. If we look at this plot of the sigmoid function, we'll notice that the sigmoid function, g of z is greater than or equal to 0.5 whenever z is greater than or equal to zero. So is in this half of the figure that g takes on values that are 0.5 and higher. This notch here, that's 0.5, and so when z is positive, g of z, the sigmoid function is greater than or equal to 0.5. Since the hypothesis for logistic regression is h of x equals g of theta and transpose x, this is therefore going to be greater than or equal to 0.5, whenever theta transpose x is greater than or equal to 0. So what we're shown, right, because here theta transpose x takes the role of z. So what we're shown is that a hypothesis is gonna predict y equals 1 whenever theta transpose x is greater than or equal to 0.

하나 이상의 피처가 있을 때 로지스틱 회귀 가설 함수를 정리합니다. 예를 들면, 가설 함수 hθ(x)는 x와 θ의 값이 있을 때 y = 1일 확률에 대한 추정치를 출력합니다. 로지스틱 회귀 가설 hθ(x)의 출력값이 0.5 보다 크거나 같다면, y=0 보다 y=1일 확률이 높다는 의미입니다. 예측은 y=1입니다. 반대로 hθ(x)의 출력값이 0.5 보다 작다면 y=1 보다 y=0일 확률이 높다는 의미입니다. 예측은 y=0입니다.

따라서, 결정하는 지점은 hθ(x)의 값이 0.5인 지점입니다. 0.5인 점을 기준으로 결과값에 따라 파지티브와 네거티브를 예측 하지만 성과 음성을 예측할 수도 있지만 허점이 있습니다. 만일 hθ(x) = 0.5 일 때 파지티브를 예측할지도 모릅니다. 너무 세부적인 사항이라 중요하지는 않습니다.

y=1을 예측할 수 있는 hθ(x) >= 0.5 일 때가 정확히 언제일까요? 시그모이드 함수 그래프를 보면 z >= 0 일 때 g(z) >= 0.5입니다. 시그모이드 함수 그래프의 절반이 0.5 보다 크거나 같습니다. z가 양수일 때 시그모이드 함수 g(z)는 0.5 보다 크거나 같습니다.

로지스틱 회귀 가설 함수 hθ(x) = g(z)가 y = 1 예측할 때는 z >= 0 또는 θ^T * x >= 0 일 때입니다. 왜냐하면 θ^T *x = z이기 때문입니다. θ^T * X >= 0 때 가설 hθ(x)는 0.5보다 크거나 같고 y=1을 예측합니다.

Let's now consider the other case of when a hypothesis will predict y is equal to 0. Well, by similar argument, h(x) is going to be less than 0.5 whenever g(z) is less than 0.5 because the range of values of z that cause g(z) to take on values less than 0.5, well, that's when z is negative. So when g(z) is less than 0.5, a hypothesis will predict that y is equal to 0. And by similar argument to what we had earlier, h(x) is equal to g of theta transpose x and so we'll predict y equals 0 whenever this quantity theta transpose x is less than 0.

y=0을 예측할 수 있는 hθ(x) < 0.5 일 때가 정확히 언제일까요? 시그모이드 함수 그래프를 보면 z < 0 일 때 g(z) < 0.5입니다. 시그모이드 함수 그래프의 절반이 0.5 보다 작습니다. z가 음수일 때 시그모이드 함수 g(z)는 0.5 보다 작습니다.

y=1 일 때와 마찬가지로 θ^T * X < 0 때 가설 hθ(x)는 0.5보다 작고 y=0을 예측합니다.

To summarize what we just worked out, we saw that if we decide to predict whether y=1 or y=0 depending on whether the estimated probability is greater than or equal to 0.5, or whether less than 0.5, then that's the same as saying that when we predict y=1 whenever theta transpose x is greater than or equal to 0. And we'll predict y is equal to 0 whenever theta transpose x is less than 0.

정리하면, 가설 함수는 추정 확률이 0.5를 기준으로 높은 지 또는 낮은 지에 따라 y = 0 또는 y = 1을 예측합니다. 즉, θ^T * X >= 0 때 y=1을 예측하고, θ^T * X < 0 때 y=0을 예측합니다.

Let's use this to better understand how the hypothesis of logistic regression makes those predictions. Now, let's suppose we have a training set like that shown on the slide. And suppose a hypothesis is h of x equals g of theta zero plus theta one x one plus theta two x two.

여기 로지스틱 회귀 가설이 예측하는 방법을 보여주는 예가 있습니다. 그림에 학습 데이터 셋은 파란 원과 엑스로 표시되어 있습니다. 로지스틱 회귀 가설 함수 hθ(x)는 다음과 같습니다.

We haven't talked yet about how to fit the parameters of this model. We'll talk about that in the next video. But suppose that via a procedure to specified. We end up choosing the following values for the parameters. Let's say we choose theta 0 equals 3, theta 1 equals 1, theta 2 equals 1. So this means that my parameter vector is going to be theta equals minus 3, 1, 1.So, when given this choice of my hypothesis parameters, let's try to figure out where a hypothesis would end up predicting y equals one and where it would end up predicting y equals zero. Using the formulas that we were taught on the previous slide, we know that y equals one is more likely, that is the probability that y equals one is greater than or equal to 0.5, whenever theta transpose x is greater than zero. And this formula that I just underlined, -3 + x1 + x2, is, of course, theta transpose x when theta is equal to this value of the parameters that we just chose. So for any example, for any example which features x1 and x2 that satisfy this equation, that minus 3 plus x1 plus x2 is greater than or equal to 0, our hypothesis will think that y equals 1, the small x will predict that y is equal to 1. We can also take -3 and bring this to the right and rewrite this as x1+x2 is greater than or equal to 3, so equivalently, we found that this hypothesis would predict y=1 whenever x1+x2 is greater than or equal to 3.

로지스틱 회귀 가설 함수에서 파라미터 θ를 데이터에 최적화시키는 방법은 아직 설명하지 않았습니다. 다음 강의에서 다룰 것입니다. 여기서 파라미터 θ의 값은 적절한 절차를 통해 데이터에 최적화된 값을 획득한 것으로 가정합니다. 파라미터 θ의 값은 θ0 = -3, θ1 = 1 , θ2 = 1입니다. 파라미터 벡터 θ 는 다음과 같습니다. 파라미터 θ를 가설 함수에 대입합니다.

파라미터 θ가 있을 때 가설 hθ(x)는 y=1과 y=0을 예측하는 지점이 어디인지를 계산합니다. 조금 전에 설명했던 공식에서 y=1일 확률을 추정합니다.

다음과 같이 로지스틱 회귀 가설이 y=1 일 확률은 다음과 같이 추정할 수 있습니다.

Let's see what that means on the figure, if I write down the equation, X1 + X2 = 3, this defines the equation of a straight line and if I draw what that straight line looks like, it gives me the following line which passes through 3 and 3 on the x1 and the x2 axis. So the part of the infospace, the part of the X1 X2 plane that corresponds to when X1 plus X2 is greater than or equal to 3, that's going to be this right half thing, that is everything to the up and everything to the upper right portion of this magenta line that I just drew. And so, the region where our hypothesis will predict y = 1, is this region, just really this huge region, this half space over to the upper right. And let me just write that down, I'm gonna call this the y = 1 region. And, in contrast, the region where x1 + x2 is less than 3, that's when we will predict that y is equal to 0. And that corresponds to this region. And there's really a half plane, but that region on the left is the region where our hypothesis will predict y = 0.

시그모이드 함수 그래프에서 방정식의 의미를 알아봅시다. 'x1 + x2 = 3' 방정식은 x1축의 (3,0)과 x2 축 (0,3)을 지나는 분홍색 직선입니다. x1과 x2 평면에서 x1 + x2 >= 3에 대응하는 공간은 분홍색 직선의 오른쪽 위 영역입니다. 오른쪽 위의 모든 영역은 y=1을 예측하는 영역입니다. 정말로 거대한 영역입니다. 가설 h(x)와 시그모이드 함수 g(z)의 값이 0.5 보다 크거나 같은 영역입니다.

반대로, x1 + x2 < 3에 대응하는 영역은 분홍색 직선의 왼쪽 아래입니다. y = 0을 예측하는 공간입니다. 이곳은 파란색 영역입니다. 왼쪽 아래 영역은 y=0를 예측하는 영역입니다. 가설 h(x)와 시그모이드 함수 g(z)의 값이 0.5 작은 영역입니다.

I wanna give this line, this magenta line that I drew a name. This line, there, is called the decision boundary. And concretely, this straight line, X1 plus X equals 3. That corresponds to the set of points, so that corresponds to the region where H of X is equal to 0.5 exactly and the decision boundary that is this straight line, that's the line that separates the region where the hypothesis predicts Y equals 1 from the region where the hypothesis predicts that y is equal to zero. And just to be clear, the decision boundary is a property of the hypothesis including the parameters theta zero, theta one, theta two. And in the figure I drew a training set, I drew a data set, in order to help the visualization. But even if we take away the data set this decision boundary and the region where we predict y =1 versus y = 0, that's a property of the hypothesis and of the parameters of the hypothesis and not a property of the data set.

분홍색 직선에 이름은 결정 경계 (decison boundary)입니다. 분홍색 직선의 방정식은 x1 + x2 = 3입니다. 직선은 가설 함수 h(x)와 시그모이드 함수 g(z)가 정확히 0.5 인 영역입니다. 결정 경계는 y=1인 영역과 y= 0인 영역을 분리합니다. 명확히 하면, 그림에서 학습 데이터 셋을 제거하더라도 결정 경계는 y=1의 영역과 y=0의 영역을 분리합니다. 이것이 결정 경계가 학습 데이터 셋의 속성이 아니라 가설의 파라미터의 속성이자 가설 함수의 속성이라는 것을 의미합니다.

Later on, of course, we'll talk about how to fit the parameters and there we'll end up using the training set, using our data to determine the value of the parameters. But once we have particular values for the parameters theta0, theta1, theta2 then that completely defines the decision boundary and we don't actually need to plot a training set in order to plot the decision boundary.

나중에 데이터 셋에 파라미터 θ를 최적화하는 방법을 설명할 것입니다. 결정 경계가 가설과 가설의 파라미터의 속성일지라도 파라미터 θ의 값을 찾기 위해 학습 데이터 셋을 활용해야 합니다. 파라미터 θ0, θ1, θ2의 값을 알면 결정 경계를 완벽하게 정의할 수 있습니다. 결정 경계를 그릴 때 학습 데이터 셋을 표시할 필요는 없습니다.

Let's now look at a more complex example where as usual, I have crosses to denote my positive examples and Os to denote my negative examples. Given a training set like this,

how can I get logistic regression to fit the sort of data? Earlier when we were talking about polynomial regression or when we're talking about linear regression, we talked about how we could add extra higher order polynomial terms to the features. And we can do the same for logistic regression.

여기 좀 더 복잡한 예제가 있습니다. 파지티브 예제는 붉은색 X 표시를 하였고, 네거티브 예제는 파란색 O표시를 하였습니다. 학습 데이터 셋에 최적화된 로지스틱 회귀 함수를 어떻게 얻을 수 있을까요? 지금까지 다항 회귀, 선형 회귀, 피처에 대한 고차 다항식 등을 배웠습니다. 로지스틱 회귀도 마찬가지입니다.

Concretely, let's say my hypothesis looks like this where I've added two extra features, x1 squared and x2 squared, to my features. So that I now have five parameters, theta zero through theta four. As before, we'll defer to the next video, our discussion on how to automatically choose values for the parameters theta zero through theta four. But let's say that varied procedure to be specified, I end up choosing theta zero equals minus one, theta one equals zero, theta two equals zero, theta three equals one and theta four equals one. What this means is that with this particular choose of parameters, my parameter effect theta theta looks like minus one, zero, zero, one, one. Following our earlier discussion, this means that my hypothesis will predict that y=1 whenever -1 + x1 squared + x2 squared is greater than or equal to 0. This is whenever theta transpose times my theta transfers, my features is greater than or equal to zero. And if I take minus 1 and just bring this to the right, I'm saying that my hypothesis will predict that y is equal to 1 whenever x1 squared plus x2 squared is greater than or equal to 1. So what does this decision boundary look like? Well, if you were to plot the curve for x1 squared plus x2 squared equals 1 Some of you will recognize that, that is the equation for circle of radius one, centered around the origin. So that is my decision boundary. And everything outside the circle, I'm going to predict as y=1. So out here is my y equals 1 region, we'll predict y equals 1 out here and inside the circle is where I'll predict y is equal to 0. So by adding these more complex, or these polynomial terms to my features as well, I can get more complex decision boundaries that don't just try to separate the positive and negative examples in a straight line that I can get in this example, a decision boundary that's a circle.

두 개의 피처를 가진 로지스틱 회귀 가설 함수는 다음과 같습니다.

학습 데이터 셋에 최적화된 파라미터 θ를 찾기 위해 2차 항이 포함된 가설을 만듭니다.

파라미터는 θ0에서 θ4까지 5 개가 있습니다. 파라미터 θ1에서 θ4까지의 값을 결정하는 방법은 다음 강의로 넘깁니다. 여기서는 적절한 절차를 통해 데이터에 최적화된 값을 얻었다고 가정합니다. 파라미터의 값은 각각 θ0 = -1, θ1 =0 , θ2= 0, θ3 = 1, θ4=1입니다. 파라미터 벡터 θ = [-1; 0; 0; 1; 1]입니다.

따라서, y=1을 예측할 때는 z와 θ^T * X >= 0 가 0 보다 클 때입니다. -1 + x1^2 + x2^2 >= 0에서 -1을 오른쪽으로 이항 하면 x1^2 + x2^2 >= 1입니다. 이 가설에 따른 결정 경계는 어떤 모양일까요? x1^2 + x2^2 = 1은 원점을 중심으로 반지름이 1인 원에 대한 방정식입니다. 분홍색 선이 결정 경계입니다. 원을 기준으로 바깥은 y=1의 영역이고 안은 y=0의 영역입니다. 따라서, 복잡한 다항식을 만들고 피처를 추가하면 더 복잡한 결정 경계를 만들 수 있습니다. 직선, 곡선, 원, 타원 등의 다양한 형태의 결정 경계를 만들 수 있습니다.

Once again, the decision boundary is a property, not of the trading set, but of the hypothesis under the parameters. So, so long as we're given my parameter vector theta, that defines the decision boundary, which is the circle. But the training set is not what we use to define the decision boundary. The training set may be used to fit the parameters theta. We'll talk about how to do that later. But, once you have the parameters theta, that is what defines the decisions boundary.

다시 한번, 결정 경계는 학습 데이터 셋의 속성이 아니라 파라미터에 따른 가설의 속성입니다. 파라미터 벡터 θ의 값을 알면 결정 경계를 정의할 수 있습니다. 여기서는 원입니다. 학습 데이터 셋은 결정 경계를 정의할 때 사용하지 않지만, 결정 경계를 만드는 가설에 대한 파라미터 θ는 학습 데이터 셋에 최적화된 값입니다. 결국, 파라미터 θ의 값을 알면 결정 경계를 정의할 수 있습니다.

Let me put back the training set just for visualization.And finally let's look at a more complex example.So can we come up with even more complex decision boundaries then this? If I have even higher polynomial terms so things like X1 squared, X1 squared X2, X1 squared equals squared and so on. And have much higher polynomials, then it's possible to show that you can get even more complex decision boundaries and the regression can be used to find decision boundaries that may, for example, be an ellipse like that

마지막으로 좀 더 복잡한 모양의 결정 경계를 살펴보겠습니다. 어떻게 더 복잡한 결정 경계를 만들 수 있을까요? 고차 다항식 항을 추가하면 훨씬 더 복잡한 경정 경계를 만들 수 있습니다. 길쭉한 모양의 타원도 만들 수 있습니다.

Or maybe a little bit different setting of the parameters maybe you can get instead a different decision boundary which may even look like some funny shape like that.

파라미터 θ의 값을 어떻게 정하는지에 따라 다양한 모양의 결정 경계를 만들 수 있습니다.

Or for even more complete examples maybe you can also get this decision boundaries that could look like more complex shapes like that where everything in here you predict y = 1 and everything outside you predict y = 0. So this higher autopolynomial features you can a very complex decision boundaries. So, with these visualizations, I hope that gives you a sense of what's the range of hypothesis functions we can represent using the representation that we have for logistic regression.

이런 웃긴 모양도 가능합니다. 학습 데이터 셋에 더 적합한 모양을 만들 고 안쪽은 y=1을 예측하고 바깥쪽은 y=0을 예측할 수 있습니다. 즉, 고차 다항식을 추가할수록 더 복잡한 결정 경계를 만들 수 있습니다. 시각화는 로지스틱 회귀 가설 함수가 어떤 의미를 가지는 지를 보여줍니다.

Now that we know what h(x) can represent, what I'd like to do next in the following video is talk about how to automatically choose the parameters theta so that given a training set we can automatically fit the parameters to our data.

이제 로지스틱 회귀 가설 함수 h(x)가 어떤 의미인지를 배웠습니다. 다음 강의에서 파라미터 θ를 자동으로 선택하는 방법을 설명할 것입니다. 학습 데이터 셋에서 파라미터 θ를 자동으로 맞출 수 있습니다.

앤드류 응의 머신 러닝 동영상 강의

정리하며

로지스틱 회귀에 관한 함수 식을 시그모이드 함수 혹은 로지스틱 함수라고 합니다.

결정 경계는 y = 0의 영역과 y = 1의 영역을 나누는 경계입니다. 수학적으로 h(x) = 0.5 인 영역입니다. 결정 경계는 학습 데이터 셋의 속성이 아니라 가설의 파라미터의 속성입니다. 학습 데이터 셋은 결정 경계를 정의하는 가설의 파라미터 θ의 값을 찾기 위해 사용합니다. 파라미터 θ 의 값을 알면 결정 경계를 정의할 수 있습니다.