brunch

You can make anything
by writing

C.S.Lewis

by 라인하트 Nov 01. 2020

앤드류 응의 머신러닝(10-2): 가설 평가하기

온라인 강의 플랫폼 코세라의 창립자인 앤드류 응 (Andrew Ng) 교수는 인공지능 업계의 거장입니다. 그가 스탠퍼드 대학에서 머신 러닝 입문자에게 한 강의를 그대로 코세라 온라인 강의 (Coursera.org)에서 무료로 배울 수 있습니다. 이 강의는 머신러닝 입문자들의 필수코스입니다. 인공지능과 머신러닝을 혼자 공부하면서 자연스럽게 만나게 되는 강의입니다.

Advice for Applying Machine Learning

머신 러닝 적용을 위한 조언

Evaluating a Learning Algorithm

(학습 알고리즘 평가)

Evaluationg your hypothesis (가설 평가하기)

In this video, I would like to talk about how to evaluate a hypothesis that has been learned by your algorithm. In later videos, we will build on this to talk about how to prevent in the problems of overfitting and underfitting as well.

이번 강의는 알고리즘이 학습한 가설을 평가하는 방법을 다룹니다. 다음 강의에서는 과적합과 과소 적합 문제를 해결하는 방법을 다룹니다.

When we fit the parameters of our learning algorithm we think about choosing the parameters to minimize the training error.

학습 데이터 셋에 적합한 학습 알고리즘의 파라미터를 찾을 때 학습 오류를 최소화하기 위한 파라미터 θ를 찾습니다.

One might think that getting a really low value of training error might be a good thing, but we have already seen that just because a hypothesis has low training error, that doesn't mean it is necessarily a good hypothesis. And we've already seen the example of how a hypothesis can overfit. And therefore fail to generalize the new examples not in the training set. So how do you tell if the hypothesis might be overfitting.In this simple example we could plot the hypothesis h of x and just see what was going on.

학습 오류가 매우 낮다는 것은 반드시 좋은 것은 아닙니다. 가설이 어떻게 과적합되는 지를 여러 사례에서 충분히 설명했습니다. 과적합된 알고리즘은 학습 데이터 셋에 없는 새로운 예제에 제대로 동작하지 않습니다. 따라서, 가설의 과적합을 구별하는 방법이 필요합니다. 간단하게 가설 함수 hθ(x)를 도식화하여 과적합 여부를 판단할 수 있습니다.

But in general for problems with more features than just one feature, for problems with a large number of features like these it becomes hard or may be impossible to plot what the hypothesis looks like and so we need some other way to evaluate our hypothesis.

그러나 일반적으로 학습 알고리즘은 매우 많은 피처를 다루기 때문에 도식화하는 것이 매우 어렵습니다. 가설을 평가하는 다른 방법이 필요합니다.

The standard way to evaluate a learned hypothesis is as follows. Suppose we have a data set like this. Here I have just shown 10 training examples, but of course usually we may have dozens or hundreds or maybe thousands of training examples. In order to make sure we can evaluate our hypothesis, what we are going to do is split the data we have into two portions. The first portion is going to be our usual training set and the second portion is going to be our test set, and a pretty typical split of this all the data we have into a training set and test set might be around say a 70%, 30% split. Worth more today to grade the training set and relatively less to the test set. And so now, if we have some data set, we run a sine of say 70% of the data to be our training set where here "m" is as usual our number of training examples and the remainder of our data might then be assigned to become our test set. And here, I'm going to use the notation m subscript test to denote the number of test examples. And so in general, this subscript test is going to denote examples that come from a test set so that x1 subscript test, y1 subscript test is my first test example which I guess in this example might be this example over here.

Finally, one last detail whereas here I've drawn this as though the first 70% goes to the training set and the last 30% to the test set. If there is any sort of ordinary to the data. That should be better to send a random 70% of your data to the training set and a random 30% of your data to the test set. So if your data were already randomly sorted, you could just take the first 70% and last 30% that if your data were not randomly ordered, it would be better to randomly shuffle or to randomly reorder the examples in your training set. Before you know sending the first 70% in the training set and the last 30% of the test set.

학습한 가설을 평가하는 표준화된 방법이 있습니다. 여기 주택 가격을 예측하는 학습 데이터 셋이 있습니다. 10개의 학습 예제를 적었지만 실제로는 수십, 수백, 수천 개의 학습 예제가 있을 것입니다. 가설을 평가하기 위해 학습 데이터 셋을 두 부분으로 분리합니다. 첫 번째 부분은 학습 셋이고, 두 번째 부분은 테스트 셋입니다. 첫 번째 부분에 전체 학습 데이터 셋의 70%를 할당하고, 두 번째 부분에 나머지 30%를 할당합니다. 첫 번째 부분인 70%는 학습 셋의 수는 m개 표기합니다. 두 번째 부분인 30%는 테스트 예제의 수를 mtest로 표기하고 test는 아래 첨자입니다. (x^(1)test, y^(1)test)는 첫 번째 테스트 예제인 (1427,199)를 가리킵니다.

마지막으로 전체 데이터의 70%는 학습 데이터 셋으로 나머지 30%는 테스트 셋으로 나누는 기준은 없지만, 무작위로 나누는 것을 추천합니다. 데이터가 무작위로 정렬된 경우는 순서대로 70%와 30%를 차례대로 분리해도 되지만, 특정 기준으로 정렬된 경우 무작위로 썩는 것이 좋습니다.

Here then is a fairly typical procedure for how you would train and test the learning algorithm and the linear regression. First, you learn the parameters theta from the training set so you minimize the usual training error objective j of theta, where j of theta here was defined using that 70% of all the data you have. There is only the training data. And then you would compute the test error. And I am going to denote the test error as j subscript test. And so what you do is take your parameter theta that you have learned from the training set, and plug it in here and compute your test set error. Which I am going to write as follows. So this is basically the average squared error as measured on your test set. It's pretty much what you'd expect. So if we run every test example through your hypothesis with parameter theta and just measure the squared error that your hypothesis has on your m subscript test, test examples. And of course, this is the definition of the test set error if we are using linear regression and using the squared error metric.

선형 회귀 학습 알고리즘을 학습하고 테스트하는 일반적인 절차입니다.

1) 학습 데이터 셋에서 파라미터 θ를 학습하여 θ의 일반적인 학습 오류 목표인 J(θ)를 최소화합니다. 여기서 비용 함수 J(θ)를 최소화할 때 모든 보유 데이터의 70%를 사용합니다.

2) 테스트 오차를 계산합니다. 선형 회귀에서 테스트 오차는 계산하는 공식은 다음과 같습니다.

학습 셋에서 학습한 파라미터 θ를 테스트 세트에서 오차를 계산합니다. 기본적으로 테스트 셋에서 측정된 평균 제곱 오차입니다. 파라미터 θ를 사용하여 가설을 통해 모든 테스트 예제를 실행하고 가설이 mtest까지 테스트 예제에 대한 제곱의 오차를 측정합니다. 이것은 선형 회귀를 사용하는 제곱 오차의 행렬입니다.

How about if we were doing a classification problem and say using logistic regression instead. In that case, the procedure for training and testing say logistic regression is pretty similar first we will do the parameters from the training data, that first 70% of the data. And it will compute the test error as follows. It's the same objective function as we always use but we just logistic regression, except that now is define using our m subscript test, test examples. While this definition of the test set error j subscript test is perfectly reasonable. Sometimes there is an alternative test sets metric that might be easier to interpret, and that's the misclassification error. It's also called the zero one misclassification error, with zero one denoting that you either get an example right or you get an example wrong. Here's what I mean. Let me define the error of a prediction. That is h of x. And given the label y as equal to one if my hypothesis outputs the value greater than equal to five and Y is equal to zero or if my hypothesis outputs a value of less than 0.5 and y is equal to one, right, so both of these cases basic respond to if your hypothesis mislabeled the example assuming your threshold at an 0.5. So either thought it was more likely to be 1, but it was actually 0, or your hypothesis stored was more likely to be 0, but the label was actually 1. And otherwise, we define this error function to be zero. If your hypothesis basically classified the example y correctly. We could then define the test error, using the misclassification error metric to be one of the m tests of sum from i equals one to m subscript test of the error of h of x(i) test comma y(i).

And so that's just my way of writing out that this is exactly the fraction of the examples in my test set that my hypothesis has mislabeled. And so that's the definition of the test set error using the misclassification error of the 0 1 misclassification metric. So that's the standard technique for evaluating how good a learned hypothesis is.

로지스틱 회귀 알고리즘은 가설을 어떻게 평가할까요? 학습 및 테스트 절차는 유사합니다. 로지스틱 회귀 학습 알고리즘은 보유한 데이터의 70% 학습 셋으로 파라미터 θ를 학습합니다. 그리고, 테스트 오차를 계산합니다. Jtest(θ)는 동일하지만 로지스틱 회귀의 함수 식은 다릅니다. mtest 개의 테스트 예제를 사용하여 테스트합니다.

예측의 오류를 다음과 같이 정의합니다. hθ(x) >= 0.5 이면서 y = 0 이거나 hθ(x) < 0.5 이면서 y=1 일 때는 err (hθ(x0, y) = 1입니다. 즉, 1일 가능성이 더 높았지만 실제로 0이었거나 0일 가능성이 높았지만 실제로 1이었습니다. 가설 함수가 올바르게 분류했다면 오류 함수를 0으로 정의합니다. 이것은 가설이 잘못 분류한 테스트 셋의 예제를 확인하는 방법입니다. 잘못 분류한 테스트 셋 오차를 정의합니다.

In the next video, we will adapt these ideas to helping us do things like choose what features like the degree polynomial to use with the learning algorithm or choose the regularization parameter for learning algorithm.

다음 강의에서 고차 다항식과 같은 피처를 선택하거나 정규화 파라미터를 선택하는 작업을 설명할 것입니다.

앤드류 응의 머신러닝 동영상 강의

정리하며

간단한 예는 가설 함수 hθ(x)를 도식화하면 과적합 여부를 판단할 수 있습니다. 그러나 일반적인 문제는 많은 수의 Feature를 가지고 있어서 가설을 도식화하기 어렵거나 불가능합니다. 도식화가 아닌 다른 방법이 필요합니다.

가설을 평가하기 위해 데이터를 70% 학습 데이터 셋과 30%의 테스트 셋으로 분리합니다. 데이터를 분리하는 기준은 없고 무작위로 나누는 것을 추천합니다. 테스트 셋 예제 총 수를 mtest로 표기하고 test는 아래 첨자입니다 각 예제는 (x^(1)test, y^(1)test)로 표시합니다.

따라서, 70%의 학습 데이터 셋에서 J(θ)를 최소화할 수 있는 파라미터 θ를 학습합니다. 그리고, 테스트 오류를 계산합니다.