brunch

You can make anything
by writing

C.S.Lewis

by 라인하트 Oct 18. 2020

앤드류 응의 머신러닝 (6-2) : 로지스틱 회귀 가설

온라인 강의 플랫폼 코세라의 창립자인 앤드류 응 (Andrew Ng) 교수는 인공지능 업계의 거장입니다. 그가 스탠퍼드 대학에서 머신 러닝 입문자에게 한 강의를 그대로 코세라 온라인 강의 (Coursera.org)에서 무료로 배울 수 있습니다. 이 강의는 머신러닝 입문자들의 필수코스입니다. 인공지능과 머신러닝을 혼자 공부하면서 자연스럽게 만나게 되는 강의입니다.

Logistic Regression

로지스틱 회귀

Classification and Representation (분류와 표현)

Hypothesis Representation (가설 표현)

Let's start talking about logistic regression. In this video, I'd like to show you the hypothesis representation. That is, what is the function we're going to use to represent our hypothesis when we have a classification problem.

이번 강의는 로지스틱 회귀를 설명합니다. 분류 문제 가설을 표현할 로지스틱 회귀 모델을 다룹니다.

Earlier, we said that we would like our classifier to output values that are between 0 and 1. So we'd like to come up with a hypothesis that satisfies this property, that is, predictions are maybe between 0 and 1. When we were using linear regression, this was the form of a hypothesis, where h(x) is theta transpose x. For logistic regression, I'm going to modify this a little bit and make the hypothesis g of theta transpose x. Where I'm going to define the function g as follows. g(z), z is a real number, is equal to one over one plus e to the negative z. This is called the sigmoid function, or the logistic function, and the term logistic function, that's what gives rise to the name logistic regression. And by the way, the terms sigmoid function and logistic function are basically synonyms and mean the same thing. So the two terms are basically interchangeable, and either term can be used to refer to this function g. And if we take these two equations and put them together, then here's just an alternative way of writing out the form of my hypothesis. I'm saying that h(x) Is 1 over 1 plus e to the negative theta transpose x. And all I've do is I've taken this variable z, z here is a real number, and plugged in theta transpose x. So I end up with theta transpose x in place of z there.

분류 문제의 결과값은 0과 1 사이에 있어야 합니다. 즉, 가설의 예측은 0과 1 사이의 값입니다. 선형 회귀에서 가설 함수 식은 다음과 같습니다.

로지스틱 회귀 가설은 선형 회귀를 조금 변형합니다.

여기서, g(z)를 시그모이드 함수 혹은 로지스틱 함수라고 합니다. g(z)의 z는 실수입니다. 로지스틱 함수라는 말은 로지스틱 회귀와 연결됩니다. 시그모이드 함수와 로지스틱 함수는 같은 의미로 사용하고, 함수 gf로 표현합니다.

이렇게 새로운 가설로 표현할 수 있습니다.

Lastly, let me show you what the sigmoid function looks like. We're gonna plot it on this figure here. The sigmoid function, g(z), also called the logistic function, it looks like this. It starts off near 0 and then it rises until it crosses 0.5 and the origin, and then it flattens out again like so. So that's what the sigmoid function looks like. And you notice that the sigmoid function, while it asymptotes at one and asymptotes at zero, as a z axis, the horizontal axis is z. As z goes to minus infinity, g(z) approaches zero. And as g(z) approaches infinity, g(z) approaches one. And so because g(z) upwards values are between zero and one, we also have that h(x) must be between zero and one.

마지막으로 시그모이드 함수의 그래프입니다. 시그모이드 함수이자 로지스틱 함수인 g(z)는 이렇게 생겼습니다. 시그모이드 그래프는 수평축의 영점에서 0.5의 값을 가지며 왼쪽으로는 0에 수렴하고 오른쪽으로는 1에 수렴합니다. 시그모이드 함수는 수평축이 Z입니다. 수평축 z가 음의 무한대로 향하면 g(z)는 0에 수렴하고, 수평축 z가 양의 무한대로 향하면 g(z)는 1에 수렴합니다. 따라서, 시그모이드 함수 g(z)는 0과 1 사이의 값을 가집니다. 로지스틱 함수와 로지스틱 회귀 가설 hθ(x)도 0과 1 사이의 값을 가집니다.

Finally, given this hypothesis representation, what we need to do, as before, is fit the parameters theta to our data. So given a training set we need to a pick a value for the parameters theta and this hypothesis will then let us make predictions. We'll talk about a learning algorithm later for fitting the parameters theta, but first let's talk a bit about the interpretation of this model.

선형 회귀 가설과 마찬가지로 최적의 가설 모델을 구하기 위해 학습 데이터에 적합한 파라미터 θ를 찾습니다. 학습 데이터 셋에 적합한 파라미터 θ를 결정하고 가설 함수에서 예측합니다. 데이터에 적합한 파라미터 θ를 찾는 학습 알고리즘은 나중에 다룰 것입니다. 먼저 가설 모델을 해석합니다.

Here's how I'm going to interpret the output of my hypothesis, h(x). When my hypothesis outputs some number, I am going to treat that number as the estimated probability that y is equal to one on a new input, example x. Here's what I mean, here's an example. Let's say we're using the tumor classification example, so we may have a feature vector x, which is this x zero equals one as always. And then one feature is the size of the tumor. Suppose I have a patient come in and they have some tumor size and I feed their feature vector x into my hypothesis. And suppose my hypothesis outputs the number 0.7. I'm going to interpret my hypothesis as follows. I'm gonna say that this hypothesis is telling me that for a patient with features x, the probability that y equals 1 is 0.7. In other words, I'm going to tell my patient that the tumor, sadly, has a 70 percent chance, or a 0.7 chance of being malignant.

가설 hθ(x)의 출력 값을 해석합니다. 가설의 출력값은 입력값 x에 대해 y = 1에 대한 추정 확률입니다. 여기 종양 분류 문제를 다루는 피처 벡터 x가 있습니다. 인터셉트 항 x0는 항상 1입니다. x1 은 종양의 크기입니다. 어떤 크기의 종양을 가진 환자가 있다고 가정합니다.

환자의 피처 벡터 x에 대한 종량 분류 가설 hθ(x)의 출력값은 0.7입니다. 환자가 y=1일 확률이 0.7입니다라는 의미입니다. 즉, 환자에게 "종양이 악성일 확률이 70%입니다"라고 말할 수 있습니다.

To write this out slightly more formally, or to write this out in math, I'm going to interpret my hypothesis output as. P of y=1 given x parameterized by theta. So for those of you that are familiar with probability, this equation may make sense. If you're a little less familiar with probability, then here's how I read this expression. This is the probability that y is equal to one. Given x, given that my patient has features x, so given my patient has a particular tumor size represented by my features x. And this probability is parameterized by theta. So I'm basically going to count on my hypothesis to give me estimates of the probability that y is equal to 1.

좀 더 수학적으로 표현합니다.

확률에서 사용하는 방정식입니다. 이 방정식은 y = 1일 확률을 의미합니다. 환자가 종양의 크기를 나타내는 피처 x에 대한 데이터와 파라미터 θ 가 있을 때 y = 1 일 확률 추정치입니다. 즉, 가설 hθ(x)의 출력 값은 'y = 1'일 확률을 추정한 값입니다.

Now, since this is a classification task, we know that y must be either 0 or 1, right? Those are the only two values that y could possibly take on, either in the training set or for new patients that may walk into my office, or into the doctor's office in the future. So given h(x), we can therefore compute the probability that y = 0 as well, completely because y must be either 0 or 1. We know that the probability of y = 0 plus the probability of y = 1 must add up to 1. This first equation looks a little bit more complicated. It's basically saying that probability of y=0 for a particular patient with features x, and given our parameters theta. Plus the probability of y=1 for that same patient with features x and given theta parameters theta must add up to one. If this equation looks a little bit complicated, feel free to mentally imagine it without that x and theta. And this is just saying that the product of y equals zero plus the product of y equals one, must be equal to one. And we know this to be true because y has to be either zero or one, and so the chance of y equals zero, plus the chance that y is one. Those two must add up to one. And so if you just take this term and move it to the right hand side, then you end up with this equation. That says probability that y equals zero is 1 minus probability of y equals 1, and thus if our hypothesis feature of x gives us that term. You can therefore quite simply compute the probability or compute the estimated probability that y is equal to 0 as well.

이것은 분류 문제입니다. 출력값은 y = 0 혹은 y = 1입니다. 출력 값 y는 그 이외의 값을 가질 수 없습니다. 학습 데이터 셋의 값이건 앞으로 들어오는 새로운 환자의 데이터 건 상관없습니다. 가설 h(x)에서 y = 0 일 확률도 계산할 수 있습니다.

왜냐하면 출력값 y는 반드시 y = 0 또는 y=1 이기 때문입니다. 즉, y = 0 확률과 y= 1 확률을 더하면 1입니다. 복잡해 보이지만 쉽게 이해할 수 있습니다. P(y = 0 | x;θ)는 특정 환자의 피처 x와 파라미터 θ가 있을 때 y = 0 일 확률이고 P(y = 1 | x;θ)는 같은 환자의 피처 x와 파라미터 θ가 있을 때 y = 1 일 확률입니다. 두 식을 더하면 반드시 1입니다. 방정식이 복잡하다면, x와 θ를 제외하고 단순히 P(y=0)과 P(y=1)을 더하면 1이다라고 생각해도 무방합니다.

확률의 방정식은 좌우가 같은 값이므로 P(y=1|x,θ)항을 오른쪽으로 이항을 해도 방정식은 성립합니다. 'y=0일 확률'은 1 - 'y=1일 확률'과 같습니다. 따라서, 특정 x에 대한 y=1일 확률이 있다면, y = 0일 확률도 쉽게 계산할 수 있습니다.

So, you now know what the hypothesis representation is for logistic regression and we're seeing what the mathematical formula is, defining the hypothesis for logistic regression. In the next video, I'd like to try to give you better intuition about what the hypothesis function looks like. And I wanna tell you about something called the decision boundary. And we'll look at some visualizations together to try to get a better sense of what this hypothesis function of logistic regression really looks like.

로지스틱 회귀의 가설의 정의, 모델 그리고 공식을 설명했습니다. 다음 강의에서 로지스틱 회귀 가설 함수를 다룹니다. 결정 경계(decision boundary)와 로지스틱 회귀 가설 함수의 모양에 대한 더 나은 감각을 익히기 위해 시각화된 자료를 더 살펴볼 것입니다.