brunch

You can make anything
by writing

C.S.Lewis

앤드류 응의 머신러닝 강의 (2-1):기설의 표현

by 라인하트 Sep 25. 2020

온라인 강의 플랫폼 코세라의 창립자인 앤드류 응 (Andrew Ng) 교수는 인공지능 업계의 거장입니다. 그가 스탠퍼드 대학에서 머신 러닝 입문자에게 한 강의를 그대로 코세라 온라인 강의 (Coursera.org)에서 무료로 배울 수 있습니다. 이 강의는 머신러닝 입문자들의 필수코스입니다. 인공지능과 머신러닝을 혼자 공부하면서 자연스럽게 만나게 되는 강의입니다. 강의를 간략하게 정리합니다.

Linear Regression with One Variable

단변수 선형 회귀

Model and Cost Function (모델과 비용 함수)

Model Represenation (모델 표현)

Our first learning algorithm will be linear regression. In this video, you'll see what the model looks like and more importantly you'll see what the overall process of supervised learning looks like.

우리의 첫 번째 학습 알고리즘은 선형 회귀입니다. 이 강의는 모델의 형식과 지도 학습의 전체 과정을 설명합니다.

Let's use some motivating example of predicting housing prices. We're going to use a data set of housing prices from the city of Portland, Oregon. And here I'm gonna plot my data set of a number of houses that were different sizes that were sold for a range of different prices. Let's say that given this data set, you have a friend that's trying to sell a house and let's see if friend's house is size of 1250 square feet and you want to tell them how much they might be able to sell the house for.

여기 주택 가격을 예측하는 데이터가 있습니다. 오래곤 주 포틀랜드 시의 주택의 크기에 따른 주택 가격에 대한 데이터 셋입니다. 집을 팔려는 여러분의 친구의 집은 1,250 평방 피트이고, 집값을 얼마에 팔 수 있는 지를 알고 싶습니다.

Well one thing you could do is fit a model. Maybe fit a straight line to this data. Looks something like that and based on that, maybe you could tell your friend that let's say maybe he can sell the house for around $220,000.

우선, 데이터 셋에 맞는 모델을 찾습니다. 데이터 셋에 맞는 직선의 모델을 찾을 수 있습니다. 이 직선에 따라 여러분의 친구에게 주택 가격을 22만 달러라고 말할 수 있습니다.

So this is an example of a supervised learning algorithm. And it's supervised learning because we're given the, quotes, "right answer" for each of our examples. Namely we're told what was the actual house, what was the actual price of each of the houses in our data set were sold for and moreover, this is an example of a regression problem where the term regression refers to the fact that we are predicting a real-valued output namely the price. And just to remind you the other most common type of supervised learning problem is called the classification problem where we predict discrete-valued outputs such as if we are looking at cancer tumors and trying to decide if a tumor is malignant or benign. So that's a zero-one valued discrete output.

이것이 지도 학습의 사례입니다. 왜냐하면 각 학습 예제에 정답이 있기 때문입니다. 즉, 데이터는 집이 팔렸을 때 집의 크기에 따른 실제 집값입니다. 실제 데이터를 바탕으로 주택 가격을 예측하는 회귀 문제입니다. 그리고 또 다른 지도 학습 문제의 유형은 이산적인 값을 출력하는 분류 문제입니다. 예를 들어, 분류 문제는 종양이 악성인지 아닌지를 0과 1의 이산적인 출력 값으로 결정합니다.

More formally, in supervised learning, we have a data set and this data set is called a training set. So for housing prices example, we have a training set of different housing prices and our job is to learn from this data how to predict prices of the houses.

공식적으로 지도 학습에서 데이터 셋을 학습 셋(Traiing Set)이라고 합니다. 주택 가격 사례에서 학습 셋은 주택 크기에 따른 집값이라는 학습 셋입니다. 알고리즘이 학습 셋을 통해 주택 가격을 배우고 집값을 예측합니다.

Let's define some notation that we're using throughout this course. We're going to define quite a lot of symbols. It's okay if you don't remember all the symbols right now but as the course progresses it will be useful [inaudible] convenient notation.

이번 과정에서 사용할 표기법을 정의합니다. 우리는 꽤 많은 기호를 정의할 것입니다. 당장 모든 기호를 외울 필요는 없지만 과정 내내 유용하고 편리하게 사용할 것입니다.

So I'm gonna use lower case m throughout this course to denote the number of training examples. So in this data set, if I have, you know, let's say 47 rows in this table. Then I have 47 training examples and m equals 47. Let me use lowercase x to denote the input variables often also called the features. That would be the x is here, it would the input features. And I'm gonna use y to denote my output variables or the target variable which I'm going to predict and so that's the second column here. [inaudible] notation, I'm going to use (x, y) to denote a single training example. So, a single row in this table corresponds to a single training example and to refer to a specific training example, I'm going to use this notation x(i) comma gives me y(i) And, we're going to use this to refer to the ith training example. So this superscript i over here, this is not exponentiation right? This (x(i), y(i)), the superscript i in parentheses that's just an index into my training set and refers to the ith row in this table, okay? So this is not x to the power of i, y to the power of i. Instead (x(i), y(i)) just refers to the ith row of this table. So for example, x(1) refers to the input value for the first training example so that's 2104. That's this x in the first row. x(2) will be equal to 1416 right? That's the second x and y(1) will be equal to 460. The first, the y value for my first training example, that's what that (1) refers to.

이 과정에서 소문자 'm'은 학습용 데이터의 수를 나타냅니다. 만약 이 데이터 셋이 47 열의 테이블이 있다고 가정합니다. 즉, 이 테이블은 47개의 학습용 데이터 셋이 있으므로 'm = 47'로 표기할 수 있습니다. 소문자 'x'는 피처를 나타내는 입력 변수입니다. x는 표의 왼쪽 열이고, 주택 크기라는 피처입니다. 'y'는 출력 값 또는 예측하기 위한 목표 변수입니다. y는 표의 오른쪽 열입니다. '(x, y)'는 단 하나의 학습용 데이터 값을 표기합니다. 이 표에서 한 줄은 하나의 학습용 데이터와 대응합니다. '(x^(i), y^(i))'는 i번째 학습용 데이터를 의미합니다. 윗 첨자 (i)는 지수가 아니라 순서입니다. (x^(i), y^(i))의 위 첨자 i는 데이터 셋에서 몇 번째인지 또는 표에서 몇 번째 줄인 지를 나타낼 뿐입니다. 즉, (x^(i), y^(i))은 x의 i제곱, y의 i제곱이 아니라는 i번째 데이터셋을 의미합니다. 예를 들면, 'x^(1)'은 첫 번째 행의 학습 예제의 x값인 2,104입니다. x^(2)는 두 번째 행의 학습 예제의 x 값인 1,416입니다. y(1)은 첫 번째 행의 학습 예제의 y값인 460입니다.

So as mentioned, occasionally I'll ask you a question to let you check your understanding and a few seconds in this video a multiple-choice question will pop up in the video. When it does please use your mouse to select what you think is the right answer.

이미 언급한 대로 때때로 여러분들이 이해를 제대로 했는 지를 확인하기 위한 문제를 낼 것입니다. 이 영상이 멈추면 객관식 문제가 뜰 것이고, 마우스를 사용해서 적합한 답을 선택해주세요.

What defined by the training set is. So here's how this supervised learning algorithm works. We saw that with the training set like our training set of housing prices and we feed that to our learning algorithm. Is the job of a learning algorithm to then output a function which by convention is usually denoted lowercase h and h stands for hypothesis And what the job of the hypothesis is, is, is a function that takes as input the size of a house like maybe the size of the new house your friend's trying to sell so it takes in the value of x and it tries to output the estimated value of y for the corresponding house. So h is a function that maps from x's to y's.

이것이 지도 학습 알고리즘 동작 방식입니다. 여기 주택 크기와 주택 가격에 대한 학습 셋이 있습니다. 학습 알고리즘은 학습 데이터 셋을 입력받습니다. 소문자 'h'는 학습 알고리즘이 하는 일을 나타냅니다. 'h'는 hypothesis (가설)의 줄임말이고, 가설 함수의 역할은 새로운 주택 크기 'x'에 입력 값으로 받아 결과 값으로 예측한 주택 가격 y를 매핑합니다. 'h'는 'x'에서부터 'y'까지 매핑하는 함수입니다.

People often ask me, you know, why is this function called hypothesis. Some of you may know the meaning of the term hypothesis, from the dictionary or from science or whatever. It turns out that in machine learning, this is a name that was used in the early days of machine learning and it kinda stuck. 'Cause maybe not a great name for this sort of function, for mapping from sizes of houses to the predictions, that you know.... I think the term hypothesis, maybe isn't the best possible name for this, but this is the standard terminology that people use in machine learning. So don't worry too much about why people call it that.

여러분들은 이 함수가 가설(hypothesis)이라 부르는 이유가 궁금할지도 모릅니다. 사전적이든 과학적이든 뭐든 간에 가설(hypothesis)이라는 용어의 의미는 알 것입니다. 머신 러닝 초창기에 사용되었던 용어지만 약간 애매합니다. 왜냐하면 주택 크기와 예측된 주택 가격을 매핑하는 함수에 썩 좋은 이름은 아닙니다. 가설(hypothesis)이라는 단어가 가장 적합한 단어는 아니지만, 머신 러닝 분야에서 사용하는 전문용어입니다. 사람들이 가설이라고 이름 짓는 이유를 고민할 필요는 없습니다.

When designing a learning algorithm, the next thing we need to decide is how do we represent this hypothesis h. For this and the next few videos, I'm going to choose our initial choice , for representing the hypothesis, will be the following. We're going to theta(x) equals theta represent h as follows. And we will write this as h theta(x) equals theta0 1 of x. And as a shorthand, sometimes instead of writing, you 1 of x. And as a shorthand, sometimes instead of writing, youknow, h subscript theta of x, sometimes there's a shorthand, I'll just write as a h of x. But more often I'll write it as a subscript theta over there.

학습 알고리즘을 디자인할 때, 우리가 결정해야 할 것은 가설 'h'를 어떻게 표현할 것인지를 결정하는 것입니다. 이 과정에서 가설(hypothesis)을 표현하기 위해 하나를 선택할 것입니다. 다음과 같이 표현할 것입니다.

hθ(x) = θ0 + θ1x

여기서, 약칭으로 hθ(x)를 h(x)라고 할 때도 있을 것입니다. 그러나 θ를 포함한 표현을 더 자주 사용할 것입니다.

And plotting this in the pictures, all this means is that, we are going to predict that y is a linear function of x. Right, so that's the data set and what this function is doing, is predicting that y is some straight line function of x. That's h of x equals theta 0 plus theta 1 x, okay? And why a linear function? Well, sometimes we'll want to fit more complicated, perhaps non-linear functions as well. But since this linear

case is the simple building block, we will start with this example first of fitting

linear functions, and we will build on this to eventually have more complex

models, and more complex learning algorithms. Let me also give this

particular model a name. This model is called linear regression or this, for

example, is actually linear regression with one variable, with the variable being

x. Predicting all the prices as functions of one variable X. And another name for this model is univariate linear regression. And univariate is just a fancy way of saying one variable. So, that's linear regression. In the next video we'll start to talk about just how we go about implementing this model.

그리고 'hθ(x) = θ0 + θ1x'를 도식화하면, x의 선형 함수인 y를 예측하는 것입니다. 그래서, 여기 검은색 직선은 'θ0 + θ1x'입니다. 왜 선형 함수일까요? 가끔 좀 비선형 함수와 같은 좀 더 복잡한 함수를 다룰 것입니다. 선형 함수는 이해하기 간단하고 쉽기 때문에 처음 시작하기 좋습니다. 'hθ(x) = θ0 + θ1x' 모델에 이름을 붙여 봅시다. 이 모델은 선형 회귀로 불립니다. 하나의 변수 x를 가진 선형 회귀 (Linear Regression with One Variate)입니다. 주택 가격을 나타내는 변수 x로 주택 가격을 예측합니다. 이 모델의 다른 이름은 단변수 선형 회귀(Univariate linear Regression)입니다. '단변수'라는 말은 하나의 변수이라는 말을 멋있게 표현한 것입니다. 이것이 선형 회귀입니다. 다음 강의에서 우리는 이 모형을 어떻게 구현할 것인지에 대해 이야기할 것입니다.