brunch

You can make anything
by writing

C.S.Lewis

by 라인하트 Oct 20. 2020

앤드류 응의 머신러닝(6-7):로지스틱회귀 멀티클래스

온라인 강의 플랫폼 코세라의 창립자인 앤드류 응 (Andrew Ng) 교수는 인공지능 업계의 거장입니다. 그가 스탠퍼드 대학에서 머신 러닝 입문자에게 한 강의를 그대로 코세라 온라인 강의 (Coursera.org)에서 무료로 배울 수 있습니다. 이 강의는 머신러닝 입문자들의 필수코스입니다. 인공지능과 머신러닝을 혼자 공부하면서 자연스럽게 만나게 되는 강의입니다.

Logistic Regression

로지스틱 회귀

Logistic Regrassion Model (로지스틱 회귀 모델)

Multiclass Classification : One vs all (다분류 : 하나 대 다수)

In this video we'll talk about how to get logistic regression to work for multiclass classification problems. And in particular I want to tell you about an algorithm called one-versus-all classification.

이번 강의에서 로지스틱 회귀를 멀티클래스 분류 문제에 적용하는 방법을 설명합니다. 주로 one-versus-all 분류 알고리즘을 설명합니다.

What's a multiclass classification problem? Here are some examples. Lets say you want a learning algorithm to automatically put your email into different folders or to automatically tag your emails so you might have different folders or different tags for work email, email from your friends, email from your family, and emails about your hobby. And so here we have a. classification problem with four classes which we might assign to the classes y = 1, y =2, y =3, and y = 4 too. And another example, for medical diagnosis, if a patient comes into your office with maybe a stuffy nose, the possible diagnosis could be that they're not ill. Maybe that's y = 1. Or they have a cold, 2. Or they have a flu. And a third and final example if you are using machine learning to classify the weather, you know maybe you want to decide that the weather is sunny, cloudy, rainy, or snow, or if it's gonna be snow, and so in all of these examples, y can take on a small number of values, maybe one to three, one to four and so on, and these are multiclass classification problems. And by the way, it doesn't really matter whether we index is at 0, 1, 2, 3, or as 1, 2, 3, 4. I tend to index my classes starting from 1 rather than starting from 0, but either way we're off and it really doesn't matter.

멀티클래스 문제란 무엇일까요? 여기 몇 개의 사례가 있습니다. 첫 번째는 학습 알고리즘이 이메일을 자동으로 여러 개의 폴더로 분류하거나 이메일에 태그를 다는 사례입니다. 업무용 이메일, 친구 이메일, 가족 이메일 및 취미 이메일에 대해 각각 다른 폴더 또는 다른 태그를 붙입니다. 이메일을 네 개의 클래스로 분류하고 각각 y = 1, 2, 3, 4라고 할당합니다. 두 번째는 코가 막힌 환자를 진단하는 사례입니다. 병원에 온 환자를 진단할 수 있는 선택지는 3 개입니다. y=1은 건강함, y=2는 감기, y=3은 독감입니다. 마지막은 날씨를 분류하는 사례입니다. 날씨를 분류하는 선택지는 4개로 정의할 수 있습니다. y=1은 맑은 날씨, y=2는 흐림, y=3은 비, y=4는 눈입니다. 즉 어떤 문제를 y = 1~3이나 y = 1~4와 같이 y를 어떠한 작은 숫자로 분류할 때 멀티 클래스 분류 문제라고 합니다. y의 인덱스가 0, 1, 2, 3인지 1, 2, 3, 4인지는 중요하지 않습니다. 저는 클래스 인덱스를 0이 아니라 1부터 시작하는 경향이 있지만, 어느 방식을 취해도 상관없습니다.

Whereas previously for a binary classification problem, our data sets look like this. For a multi-class classification problem our data sets may look like this where here I'm using three different symbols to represent our three classes. So the question is given the data set with three classes where this is an example of one class, that's an example of a different class, and

that's an example of yet a third class.

왼쪽 그림은 이진 분류 문제(binary classification)이고, 오른쪽 그림은 멀티클래스 분류 문제입니다. 오른쪽 그림은 세 개의 다른 기호를 사용하여 세 개의 클래스로 표현합니다. 세 개 클래스는 × 클래스, △ 클래스, □ 클래스입니다.

How do we get a learning algorithm to work for the setting? We already know how to do binary classification using a regression. We know how to you know maybe fit a straight line to set for the positive and negative classes.

그림의 데이터 셋에서 학습 알고리즘은 어떻게 동작할까요? 이미 지난 강의에서 두 개 데이터로 분류하는 방법을 배웠습니다. 데이터 그룹 사이에 직선을 그리고 파지티브 클래스와 네거티브 클래스로 구분했습니다.

You see an idea called one-vs-all classification. We can then take this and make it work for multi-class classification as well. Here's how a one-vs-all classification works. And this is also sometimes called one-vs-rest. Let's say we have a training set like that shown on the left, where we have three classes of y equals 1, we denote that with a triangle, if y equals 2, the square, and if y equals three, then the cross.

오른쪽 그림을 분류하기 위해 one-vs-all 분류 또는 one-vs-rest 라고 부르는 기법을 사용합니다. one-vs-all 분류가 동작 방식을 설명합니다. 세 개의 클래스에 y 값을 부여합니다. △을 y = 1 , □을 y = 2, ×를 y = 3으로 설정합니다.

What we're going to do is take our training set and turn this into three separate binary classification problems. I'll turn this into three separate two class classification problems.

So let's start with class one which is the triangle. We're gonna essentially create a new sort of fake training set where classes two and three get assigned to the negative class. And class one gets assigned to the positive class. You want to create a new training set like that shown on the right, and we're going to fit a classifier which I'm going to call h subscript theta superscript one of x where here the triangles are the positive examples and the circles are the negative examples. So think of the triangles being assigned the value of one and the circles assigned the value of zero. And we're just going to train a standard logistic regression classifier and maybe that will give us a position boundary that looks like that. Okay? This superscript one here stands for class one, so we're doing this for the triangles of class one.

멀티 클래스 분류 문제를 세 가지의 이진 분류 문제로 전환합니다. 세 가지 분류를 두 종류 분류 문제로 바꿉니다. 클래스 1 △부터 시작합니다. 클래스 2 □와 클래스 3 ×를 가짜 학습 데이터 셋 ○로 배정합니다. 오른쪽 상단의 그림처럼 새로운 학습 셋에 적합한 가설 hθ^(1)(x)을 설정합니다. 위 첨자는 클래스 1을 의미합니다. 그리고 클래스 1 △는 파지티브 클래스이고, ○는 네거티브 클래스입니다. 클래스 1 △에 1을 할당하고, ○에는 0을 할당합니다. 그리고 가설 hθ^(1)(x) 은 표준 로지스틱 회귀 분류를 학습합니다. 이진 분류 문제에서 다루었던 것과 동일한 경계선을 만들 수 있습니다.

Next we do the same thing for class two. Gonna take the squares and assign the squares as the positive class, and assign everything else, the triangles and the crosses, as a negative class. And then we fit a second logistic regression classifier and call this h of x superscript two, where the superscript two denotes that we're now doing this, treating the square class as the positive class. And maybe we get classified like that.

다음으로 클래스 2에 대해서도 똑같은 작업을 합니다. 클래스 2 □ 데이터는 파지티브 값을 할당하고 나머지 △와 × 데이터는 네거티브 값을 할당합니다. 학습 데이터 셋에 적합한 두 번째 로지스틱 회귀 분류 가설을 hθ^(2)(x)라고 합니다. 위 첨자 2는 이 가설이 클래스 2를 의미합니다. □을 파지티브 클래스로 나타냅니다. 이진 분류 문제에서 다루었던 것과 동일한 경계선을 만들 수 있습니다.

And finally, we do the same thing for the third class and fit a third classifier h super script three of x, and maybe this will give us a decision bounty of the visible cross fire. This separates the positive and negative examples like that.

마지막으로 세 번째 클래스 ×에 대해 똑같은 작업을 수행합니다. 우측 하단의 그림의 학습 데이터 셋에 맞춘 세 번째 로지스틱 회귀 분류 가설을 hθ^(3)(x)라고 합니다. ×에서도 경계선을 결정합니다. 경계선은 파지티브 값과 네거티브 값을 분리합니다.

So to summarize, what we've done is, we've fit three classifiers. So, for i = 1, 2, 3, we'll fit a classifier x super script i subscript theta of x. Thus trying to estimate what is the probability that y is equal to class i, given x and parametrized by theta. Right? So in the first instance for

this first one up here, this classifier was learning to recognize the triangles. So it's thinking of the triangles as a positive clause, so x superscript one is essentially trying to estimate what is the probability that the y is equal to one, given that x is parametrized by theta. And similarly, this is treating the square class as a positive class and so it's trying to estimate the probability that y = 2 and so on. So we now have three classifiers, each of which was trained to recognize one of the three classes. Just to summarize, what we've done is we want to train a logistic regression classifier h superscript i of x for each class i to predict the probability that y is equal to i.

요약하자면, 학습 데이터에 적합한 세 가지의 분류기를 만들었습니다. 즉 i = 1, 2, 3에 대해 학습 데이터셋에 적합한 가설 함수 hθ^(i)(x) = P(y=i | x; θ)를 계산하였습니다. 그래서 y = 멀티 클래스 i 일 때 학습 예제 x와 파라미터 θ 를 추정합니다. 첫 번째 예에서 가설 함수 hθ^(1)(x)는 △ 클래스를 인식하는 방법을 학습했습니다. △가 파지티브 클래스이기 때문에 x^(1)은 hθ^(1)(x) = P(y=1 | x; θ)입니다. 학습 에제 x와 파라미터 θ가 있을 때 y=1일 확률을 추정합니다. 두 번째에서 hθ^(2)(x) = P(y=2 | x; θ)에서 □가 파지티브 클래스이기 때문에 y = 2일 확률을 추정합니다. 3의 경우도 마찬가지입니다. hθ^(3)(x) = P(y=3 | x; θ)입니다. 세 개의 가설 함수 hθ(x)는 세 클래스 중 하나를 인식해서 훈련합니다.

Finally to make a prediction, when we're given a new input x, and we want to make a prediction. What we do is we just run all three of our classifiers on the input x and we then pick the class i that maximizes the three. So we just basically pick the classifier, I think whichever one of the three classifiers is most confident and so the most enthusiastically says that it thinks it has the right clause. So whichever value of i gives us the highest probability we then predict y to be that value.

마지막으로, 새로운 입력값 x가 있을 때 예측을 할 수 있습니다. 새로운 입력값 x에 대해 가설 함수 hθ^(i)(x)를 모두 실행한 후 가장 높은 확률이 나온 클래스 i를 선택합니다. 예를 들면, 여기서 세 개의 가설 함수 중 가장 신뢰도가 높고 가장 열정적으로 "이것이 올바른 클래스다"라고 하는 것을 선택합니다. 가장 높은 확률을 주는 i 값이라면 y는 그 값이라고 예측합니다.

So that's it for multi-class classification and one-vs-all method. And with this little method you can now take the logistic regression classifier and make it work on multi-class classification problems as well

지금까지 one-vs-all을 사용한 멀티 클래스 분류였습니다. 로지스틱 회귀를 활용하여 멀티 클래스를 이진 분류처럼 분류할 수 있습니다.