brunch

You can make anything
by writing

C.S.Lewis

by 라인하트 Oct 15. 2020

앤드류 응의 머신러닝 정리 (5-6): 옥타브 벡터화

온라인 강의 플랫폼 코세라의 창립자인 앤드류 응 (Andrew Ng) 교수는 인공지능 업계의 거장입니다. 그가 스탠퍼드 대학에서 머신 러닝 입문자에게 한 강의를 그대로 코세라 온라인 강의 (Coursera.org)에서 무료로 배울 수 있습니다. 이 강의는 머신러닝 입문자들의 필수코스입니다. 인공지능과 머신러닝을 혼자 공부하면서 자연스럽게 만나게 되는 강의입니다.

Octave / Matlab Tutorial

옥타브 / 매트랩 튜토리얼

Vectorization (벡터화)

In this video I like to tell you about the idea of Vectorization. So, whether you using Octave or a similar language like MATLAB or whether you're using Python [INAUDIBLE], R, Java, C++, all of these languages have either built into them or have regularly and easily accessible difference in numerical linear algebra libraries. They're usually very well written, highly optimized, often sort of developed by people that have PhDs in numerical computing or they're really specialized in numerical computing. And when you're implementing machine learning algorithms, if you're able to take advantage of these linear algebra libraries or these numerical linear algebra libraries, and make some routine calls to them rather than sort of write code yourself to do things that these libraries could be doing. If you do that, then often you get code that, first, is more efficient, so you just run more quickly and take better advantage of any parallel hardware your computer may have and so on. And second, it also means that you end up with less code that you need to write, so it's a simpler implementation that is therefore maybe also more likely to be by free. And as a concrete example, rather than writing code yourself to multiply matrices, if you let Octave do it by typing a times b, that would use a very efficient routine to multiply the two matrices. And there's a bunch of examples like these, where if you use appropriate vectorization implementations you get much simpler code and much more efficient code.

이 강의에서 벡터화의 개념을 설명합니다. 옥타브와 유사한 언어인 MATLAB 또는 파이썬 (Python), R, 자바(Java), C++과 같은 프로그래밍 언어들은 선형 대수 라이브러리가 있습니다. 수학 컴퓨팅 분야의 박사와 전문가가 선형 대수 라이브러리를 고도로 최적화하였습니다. 머신 러닝 알고리즘은 선형 대수 라이브러리와 기타 알고리즘을 위한 라이브러리를 활용합니다. 개발자는 직접 코드를 작성할 필요가 없이 라이브러리의 함수들을 호출할 수 있습니다. 라이브러리에서 효율적인 함수를 호출하는 코드로 더 빨리 실행하고 컴퓨터의 코어나 CPU를 병렬로 더 잘 활용할 수 있습니다.

두 번째로 몇 줄의 함수 호출 코드로 더 간단하게 구현할 수 있습니다. 대부분이 무료입니다. 예를 들면, 옥타브 프로그램은 두 행렬을 곱할 때 직접 코드를 작성하는 것보다 훨씬 더 효율적인 루틴을 사용합니다. 적절한 벡터화 구현은 더 간단하고 효율적인 코드를 사용하는 지름길입니다.

Let's look at some examples. Here's our usual hypothesis for linear regression, and if you want to compute h(x), notice that there's a sum on the right. And so one thing you could do is, compute the sum from j = 0 to j = n yourself. Another way to think of this is to think of h(x) as theta transpose x, and what you can do is, think of this as you are computing this inner product between two vectors where theta is your vector, say, theta 0, theta 1, theta 2. If you have two features, if n equals two, and if you think x as this vector, x0, x1, x2, and these two views can give you two different implementations.

예제를 봅시다. 여기에 선형 회귀 가설 함수가 있습니다.

가설 hθ(x)를 계산합니다. 그래서, 첫 번째 방법은 j = 0에서 n까지 직접 계산하는 것입니다. 또 다른 방법은 θ^T* X입니다. 행렬 θ 를 전치시킨 후에 행렬 X와 곱하는 것입니다. 2개이 피처에 대해 파라미터 벡터 θ = [ θ0; θ1 ; θ2]입니다. 따라서, 피처의 개수 n = 2이고 X = [x0 ; x1; x2]입니다. 즉, 가설 함수는 두 가지 다른 관점으로 구현할 수 있습니다.

Here's what I mean. Here's an unvectorized implementation for how to compute and by unvectorize, I mean without vectorization. We might first initialize prediction just to be 0.0. The prediction's going to eventually be h(x), and then I'm going to have a for loop for j=1 through n+1, prediction gets incremented by theta(j) * x(j). So it's kind of this expression over here. By the way, I should mention, in these vectors that I wrote over here, I had these vectors being 0 index. So I had theta 0, theta 1, theta 2. But because MATLAB is one index, theta 0 in that MATLAB, we would end up representing as theta 1 and the second element ends up as theta 2 and this third element may end up as theta 3, just because our vectors in MATLAB are indexed starting from 1, even though I wrote theta and x here, starting indexing from 0, which is why here I have a for loop. j goes from 1 through n+1 rather than j goes through 0 up to n, right? But so this is an unvectorized implementation in that we have for loop that is summing up the n elements of the sum.

여기 벡터화를 하지 않고 직접 계산하는 방법을 옥타브 나 매트랩으로 구현한 예제입니다. 벡터화는 없습니다. n은 피처의 개수입니다. 항상 x0 = 1 이므로 n+1개입니다. 결국 prediction = h(x)입니다. 0-인텍스드 벡터입니다. 그러나 MATLAB은 θ0는 1 인덱스입니다. θ1은 두 번째 성분이고, θ2는 세 번째 성분입니다. MATLAB에서 벡터는 1부터 시작하는 인덱스입니다. 여기서는 0부터 시작합니다. 따라서, 0부터 시작해서 n에서 끝나는 것이 아니라 1부터 시작해서 n+1로 끝납니다. 그러나 이것은 비 백터화 구현입니다. for 루프는 n 개의 성분을 합산하는 것입니다..

prediction = 0.0; % prediction 변수를 0.0으로 초기화

for j = 1:n+1; % 루프 시작, j는 1부터 시작해서 n+1까지

prediction = prediction + theta(j) * x(j)

% 기존 prediciton 값에 새로운 예측치 θ (j)* x(j)의 값을 증가 시킴

end;

In contrast, here's how you would write a vectorized implementation, which is that you would think of a x and theta as vectors. You just said prediction = theta' * x. You're just computing like so. So instead of writing all these lines of code with a for loop, you instead just have one line of code. And what this line of code on the right will do is, it will use Octaves highly optimized numerical linear algebra routines to compute this inner product between the two vectors, theta and X, and not only is the vectorized implementation simpler, it will also run much more efficiently.

대조적으로, 오른쪽은 벡터화 구현 예제입니다. X와 θ를 벡터로 간주합니다.

prediciton = theta' * x % θ^T* X를 구현

비 벡터화에서 for 루프로 구현한 것을 벡터에서는 한 줄의 코드로 구현합니다. 백터화 구현 코드는 옥타브 프로그램이 고도로 최적화되어 있는 선형 대수 라이브러리의 루틴을 사용하여 두 벡터 및 행렬 theta와 x의 내적을 계산합니다. 벡터화 구현이 훨씬 더 간단하고 효율적입니다.

So that was octave, but the issue of vectorization applies to other programming language as well. Lets look on the example in C++. Here's what an unvectorized implementation might look like. We again initialize prediction to 0.0 and then we now how a for loop for j = 0 up to n. Prediction += theta j * x [j], where again, you have this explicit for loop that you write yourself.

그것은 옥타브 프로그램이었습니다. 백터화 문제는 다른 프로그래밍 언어에도 적용합니다. C++의 예제를 살펴보겠습니다. 비 벡터화 구현입니다. 여러분도 직접 for 루프를 명확하게 작성합니다.

double prediction = 0.0; % prediction 변수를 0.0으로 초기화

for (int j = 0; j <= n ; j++). % 루프 시작, j=0에서 n까지

predition += theta [j] * x [j]; % 예측의 값

In contrast, using a good numerical linear algebra library in C++, you can instead write code that might look like this. So depending on the details of your numerical linear algebra library,

you might be able to have an object, this is a C++ object, which is vector theta, and a C++ object which is vector x, and you just take theta.transpose * x, where this times becomes a C++ sort of overload operator so you can just multiply these two vectors in C++. And depending on the details of your numerical linear algebra library, you might end up using a slightly different syntax, but by relying on the library to do this inner product,

you can get a much simpler piece of code and a much more efficient one.

반대로 C++도 좋은 수치 선형 대수 라이브러리가 있습니다. 라이브러리를 활용하여 간단한 코드로 표현할 수 있습니다. 수치 선형 대수의 세부 사항에 따라 객체를 가질 수 있습니다.

double prediction = theta.transpose() * x; % θ^T* X를 구현

벡터 θ 와 벡터 x는 C++ 객체입니다. 두 벡터를 C++로 곱합니다. 라이브러리에 따라 약간 다른 구문을 사용할 수도 있지만 라이브러리에 의존하여 내적을 구하는 것이 훨씬 더 간단한고 효율적인 코드를 얻을 수 있습니다.

Let's now look at a more sophisticated example. Just to remind you, here's our update rule for a gradient descent of a linear regression. And so we update theta j using this rule for all values of j = 0, 1, 2, and so on. And if I just write out these equations for theta 0, theta 1, theta 2, assuming we have two features, so n = 2. Then these are the updates we perform for theta 0, theta 1, theta 2, where you might remember my saying in an earlier video, that these should be simultaneous updates. So, let's see if we can come up with a vectorizing notation of this.

좀 더 복잡한 예제를 봅시다. 상기하는 차원에서, 이것은 선형 회귀 경사 하강법 업데이트 공식입니다. j = 0,1,2 등의 모든 값에 대해 경사 하강 업데이트 공식을 사용하여 θj를 업데이트합니다. 이것은 n=2 일 때 θ0, θ1, θ2 을 업데이트합니다. 모든 파라미터 θ는 동시에 업데이트 해야 합니다. 그래서 우리가 업데이트 방정식을 벡터화 표기법을 생각해낼 수 있습니다.

Here are my same three equations written in a slightly smaller font, and you can imagine that one way to implement these three lines of code is to have a for loop that says for j = 0, 1 through 2 to update theta j, or something like that. But instead, let's come up with a vectorized implementation and see if we can have a simpler way to basically compress these three lines of code or a for loop that effectively does these three steps one set at a time. Let's see if we can take these three steps and compress them into one line of vectorized code. Here's the idea. What I'm going to do is, I'm going to think of theta as a vector, and I'm gonna update theta as theta-alpha times some other vector delta, where delta's is going to be equal to 1 over m, sum from i = 1 through m. And then this term over on the right, okay?

여기에 피처가 2개일 때 경사 하강법 업데이트 공식입니다.

세 줄의 공식을 코드로 구현하는 방법 중 하나는 'for j = 0,1,2 '에서 θj를 업데이트하는 것입니다. 또 다른 방법은 벡터화 구현입니다. 세 줄의 코드를 압축하는 더 간단한 방법 또는 세 가지 스텝을 한 번에 할 수 있는 효과적인 for 루프 사용법을 생각합니다. 세 가지 단계를 한 줄의 백터화 코드를 압축합니다.

우선 파라미터 θ는 벡터입니다. 'θ - α δ'의 값으로 θ 를 업데이트합니다. δ(델타) = 1/m * ∑(h(x)^(i)) - y^(i))*x^(i)입니다. δ 는 훈련용 데이터셋의 예측값에서 실제값을 뺀 오차에 x^(i)를 곱합니다.

So, let me explain what's going on here. Here, I'm going to treat theta as a vector, so this is n plus one dimensional vector, and I'm saying that theta gets here updated as that's a vector, Rn + 1. Alpha is a real number, and delta, here is a vector. So, this subtraction operation, that's a vector subtraction, okay? Cuz alpha times delta is a vector, and so I'm saying theta gets this vector, alpha times delta subtracted from it. So, what is a vector delta?

파라미터 θ는 R^(n+1)차원 벡터입니다. 학습률 α는 실수입니다. δ (델타)는 벡터입니다. 그래서 여기 마이너스 연산은 벡터의 마이너스 연산입니다. 왜냐하면 실수 α * δ 는 벡터이기 때문입니다. 그렇다면 벡터 δ는 무엇입니까?

Well this vector delta, looks like this, and what it's meant to be is really meant to be this thing over here. Concretely, delta will be a n plus one dimensional vector, and the very first element of the vector delta is going to be equal to that. So, if we have the delta, if we index it from 0, if it's delta 0, delta 1, delta 2, what I want is that delta 0 is equal to this first box in green up above. And indeed, you might be able to convince yourself that delta 0 is this 1 of the m sum of ho(x), x(i) minus y(i) times x(i) 0

벡터 δ는 빨간 박스의 수식 δ = 1/m * ∑(h(x)^(i)) - y^(i))*x^(i)입니다. 이 수식은 세 개의 θ 업데이트 공식의 빨간색 박스 부분입니다.

구체적으로 δ 는 (n+1) 차원의 벡터가 되고 벡터의 첫 번째 성분은 녹색 박스와 같습니다. 그래서, 벡터 δ = [δ0 ; δ1; δ2] 일 때 δ0는 첫 번째 녹색 박스입니다. δ0 = 1/m * ∑(h(x)^(i)) - y^(i))*x^(i) 0입니다.

So, let's just make sure we're on this same page about how delta really is computed. Delta is 1 over m times this sum over here, and what is this sum? Well, this term over here, that's a real number, and the second term over here, x i, this term over there is a vector, right, because x(i) may be a vector that would be, say, x(i) 0, x(i) 1, x(i) 2, right,

δ 를 계산하는 방법은 동일한 페이지에서 확인합니다. δ = 1/m * ∑(h(x)^(i)) - y^(i))*x^(i)입니다. 그렇다면 ∑아래의 식은 무엇입니까? (hθ(x^(i)) - y(i))는 실수이고, x(i)는 벡터입니다. 왜냐하면 x(i) = [ x(i)0; x(i)1; x(i)2 ] 이기 때문입니다.

And what is the summation? Well, what the summation is saying is that, this term, that is this term over here, this is equal to, (h(x(1))- y(1)) * x(1) + (x(2))- y(2)) * x(2) +, and so on, okay?

Because this is summation of i, so as i ranges from i = 1 through m, you get these different terms, and you're summing up these terms here. And the meaning of these terms, this is a lot like if you remember actually from the earlier quiz in this, right, you saw this equation.

그렇다면 '∑(h(x)^(i)) - y^(i))*x^(i)'의 의미는 무엇입니까? 여기서 합계는 (h(x(1))- y(1)) * x(1) + (x(2))- y(2)) * x(2)+ (h(x(3))- y(3)) * x(3) + (x(4))- y(4)) * x(4) + 등입니다. 여기서 i = 1에서 m까지의 범위입니다. 이 항을 모두 합산합니다. 또, 이 방정식은 이미 여러 번 본 적이 있습니다.

We said that in order to vectorize this code we will instead said u = 2v + 5w. So we're saying that the vector u is equal to two times the vector v plus five times the vector w. So this is an example of how to add different vectors and this summation's the same thing.

이 코드를 벡터화하기 위해 u = 2v + 5w입니다. 벡터 u가 벡터 v를 2배 하고 벡터 w를 5배 한 것을 더한 것과 같습니다. 이것은 다른 벡터를 추가하는 방법이고 합계는 같습니다.

This is saying that the summation over here is just some real number, right? That's kinda like the number two or some other number times the vector, x1. So it's kinda like 2v or say some other number times x1, and then plus instead of 5w we instead have some other real number, plus some other vector, and then you add on other vectors, plus dot, dot, dot, plus the other vectors, which is why, over all, this thing over here, that whole quantity, that delta is just some vector.

이것은 합이 실수라는 뜻입니다. 그것은 숫자 2 또는 다른 숫자 x 벡터 x1과 비슷하거나 다른 숫자 x1이라고 말하면 5w 대신에 다른 실수가 있습니다. 다른 벡터를 더한 다음 다른 벡터를 더합니다. 그래서 이 전체의 양 즉, δ는 벡터일 뿐입니다.

And concretely, the three elements of delta correspond if n = 2, the three elements of delta correspond exactly to this thing, to the second thing, and this third thing. Which is why when you update theta according to theta- alpha delta, we end up carrying exactly the same simultaneous updates as the update rules that we have up top.

구체적으로 n=2일 때 δ의 세 요소는 검은 박스입니다. 이것이 'θ - α δ'에 따라 θ를 업데이트할 때 기존의 업데이트 공식과 정확히 동일한 동시 업데이트를 전달하는 이유입니다.

So, I know that there was a lot that happened on this slide, but again, feel free to pause the video and if you aren't sure what just happened I'd encourage you to step through this slide to make sure you understand why is it that this update here with this definition of delta, right, why is it that that's equal to this update on top? And if it's still not clear, one insight is that, this thing over here, that's exactly the vector x, and so we're just taking all three of these computations, and compressing them into one step with this vector delta, which is why we can come up with a vectorized implementation of this step of the new refresh in this way. So, I hope this step makes sense and do look at the video and see if you can understand it.

In case you don't understand quite the equivalence of this map, if you implement this, this turns out to be the right answer anyway. So, even if you didn't quite understand equivalence,

if you just implement it this way, you'll be able to get linear regression to work. But if you're able to figure out why these two steps are equivalent, then hopefully that will give you a better understanding of vectorization as well.

이번 슬라이드에서 너무 많은 것을 다루었습니다. 확실히 이해되지 않는다면, 동영상을 정지한 후 자세히 이해하는 것이 좋습니다. 이 업데이트가 δ 정의와 함께 여기에 있는 것이 맞습니다. 업데이트가 위에 있는 이유는 무엇일까요? 아직 명확하지 않다면, 한 가지 확실한 것은 여기 회색 박스는 벡터 x입니다. 그래서 우리는 세 가지 계산을 모두 가져와서 벡터 δ를 사용하여 한 단계로 압축합니다.

벡터화 구현은 이것이 가능합니다. 그래서 이 단계에서 강의를 듣고 이해할 수 있는 지를 확인하시기 바랍니다. 잘 이해되지 못하더라도 실제로 구현하면 정답이 나옵니다. 따라서, 이해를 하지 못하더라도 이런 식으로 구현하면 선형 회귀가 작동합니다. 그러나, 이 두 단계가 동일한 이유를 파악한다면 벡터화에 대한 더 나은 이해를 얻을 수 있습니다.

And finally, if you are implementing linear regression using more than one or two features, so sometimes we use linear regression with 10's or 100's or 1,000's of features. But if you use the vectorized implementation of linear regression, you'll see that will run much faster than if you had, say, your old for loop that was updating theta zero, then theta one, then theta two yourself. So, using a vectorized implementation, you should be able to get a much more efficient implementation of linear regression. And when you vectorize later algorithms that we'll see in this class, there's good trick, whether in Octave or some other language like C++, Java, for getting your code to run more efficiently.

마지막으로 하나 또는 두 개 이상의 Feature를 가진 선형 회귀를 구현하기도 하지만, 종종 10, 100 또는 1,000 개의 Feature를 가진 선형 회귀를 구현합니다. 이런 경우 벡터화된 선형 회귀를 구현하면 θ0, θ1, θ2를 직접 업데이트하는 for 루프보다 훨씬 빠르게 실행됩니다. 따라서, 벡터화 구현을 사용하면 훨씬 더 효율적인 선형 회귀를 구현할 수 있습니다. 그리고, 이 과정에서 알고리즘을 옥타브나 C++, 자바와 같은 다른 언어에서 코드를 보다 효율적으로 실행하는 좋은 트릭입니다.

앤드류 응의 동영상 강의

정리하며

머신 러닝 알고리즘은 선형 대수 라이브러리 및 수치 관련 라이브러리를 활용합니다. 여러분들이 직접 코드를 작성할 필요가 없이 라이브러리의 함수들을 직접 호출합니다. 간단한 함수 호출로 효율적인 코드로 더 빨리 실행하는 코드를 얻을 수 있고, 코드가 줄어들어 더 간단하게 구현합니다.

예를 들어, Hθ(x) = θ0x0 + θ1x1 + θ2x2 + θ3x3 + θ4x4 +... + θnxn (일반 함수 식)

= θ^T* X (행렬)

모든 훈련용 데이터 셋을 1부터 m개까지 진행할 때 for 문을 이용할 경우 아래와 같습니다.

prediction = 0.0; % predivtion 변수를 0.0으로 초기화

for j = 1:n+1; % 루프 시작, j는 1부터 시작해서 n+1까지

prediction = prediction + theta(j) * x(j)

% 기존 prediciton 값에 새로운 예측치 θ (j)* x(j)의 값을 증가 시킴

end;

벡터화를 통한 식을 계산할 때는 아래와 같습니다.

prediciton = theta' * x % θ^T* X를 구현

벡터화를 진행하면 매우 간단하게 계산할 수 있습니다.

경사 하강 알고리즘도 벡터화로 계산하면 훨씬 단순합니다.

θ := θ - α δ

(δ = 1/m * ∑(h(x)^(i)) - y^(i))*x^(i))

일 때 벡터로 δ를 다음과 같이 표현합니다.

δ = 1/m *(θ^TX - y)*X

문제풀이

다음 For 루프를 벡터화 구현으로 계산하는 식은?

정답은 두 번째입니다.

브런치는 최신 브라우저에 최적화 되어있습니다. IE chrome safari