brunch

You can make anything
by writing

C.S.Lewis

by 라인하트 Oct 05. 2020

앤드류 응의 머신러닝 (3-3) : 행렬과 벡터의 곱셈

온라인 강의 플랫폼 코세라의 창립자인 앤드류 응 (Andrew Ng) 교수는 인공지능 업계의 거장입니다. 그가 스탠퍼드 대학에서 머신 러닝 입문자에게 한 강의를 그대로 코세라 온라인 강의 (Coursera.org)에서 무료로 배울 수 있습니다. 이 강의는 머신러닝 입문자들의 필수코스입니다. 인공지능과 머신러닝을 혼자 공부하면서 자연스럽게 만나게 되는 강의입니다.

Linear Algebra Review

선형 대수 리뷰

Matrix-Vector Multiplication (행렬과 벡터 곱셈)

In this video, I'd like to start talking about how to multiply together two matrices.

We'll start with a special case of that, of matrix vector multiplication - multiplying a matrix together with a vector.

이번 강의는 두 행렬을 곱하는 방법을 다룹니다. 우선 특수한 경우인 행렬과 벡터의 곱셈을 먼저 설명합니다.

Let's start with an example. Here is a matrix, and here is a vector, and let's say we want to multiply together this matrix with this vector, what's the result? Let me just work through this example and then we can step back and look at just what the steps were. It turns out the result of this multiplication process is going to be, itself, a vector.

여기 행렬과 벡터의 곱셈 예제가 있습니다. 왼쪽은 행렬이고, 오른쪽은 벡터입니다. 결과는 어떻게 나올까요? 행렬의 곱셈을 하는 방법을 우선 알아야 합니다. 예제의 연산 결과는 벡터입니다.

And I'm just going work with this first and later we'll come back and see just what I did here. To get the first element of this vector I am going to take these two numbers and multiply them with the first row of the matrix and add up the corresponding numbers. Take one multiplied by one, and take three and multiply it by five, and that's what, that's one plus fifteen so that gives me sixteen. I'm going to write sixteen here. then for the second row, second element, I am going to take the second row and multiply it by this vector, so I have four times one, plus zero times five, which is equal to four, so you'll have four there. And finally for the last one I have two one times one five, so two by one, plus one by 5, which is equal to a 7, and so I get a 7 over there. It turns out that the results of multiplying that's a 3x2 matrix by a 2x1 matrix is also just a two-dimensional vector. The result of this is going to be a 3x1 matrix, so that's why three by one 3x1 matrix, in other words a 3x1 matrix is just a three dimensional vector.

우선 괄호를 그립니다. 나중에 다시 돌아와서 값을 채울 것입니다. 연산 결과인 벡터의 첫 번째 성분은 두 개의 숫자를 더합니다. 두 숫자는 행렬의 첫 번째 행과 백터의 열의 숫자와 대응하는 수를 각각 곱하는 것입니다.

결과 벡터의 첫 번째 성분은 1 ×1 + 3 × 5 = 1 + 15 = 16 입니다. 결과 백터의 첫 번째 행은 16입니다.

결과 백터의 두 번째 성분은 4 ×1 + 0 × 5 = 4 + 0 = 4입니다. 결과 백터의 두 번째 행은 4입니다.

결과 백터의 세 번째 성분은 2 ×1 + 1 × 5 = 2 +5 = 7입니다. 결과 백터의 세 번째 행은 7입니다.

여기서, 3 × 2 행렬과 2 ×1 벡터를 곱한 결과 3 X 1 행렬이 만들어 졌습니다. 즉, 3차원 벡터입니다.

So I realize that I did that pretty quickly, and you're probably not sure that you can repeat this process yourself, but let's look in more detail at what just appened and what this process of multiplying a matrix by a vector looks like. Here's the details of how to multiply a matrix by a vector. Let's say I have a matrix A and want to multiply it by a vector x. The result is going to be some vector y. So the matrix A is a m by n dimensional matrix, so m rows and n columns and we are going to multiply that by a n by 1 matrix, in other words an n dimensional vector. It turns out this "n" here has to match this "n" here. In other words, the number of columns in this matrix, so it's the number of n columns. The number of columns here has to match the number of rows here. It has to match the dimension of this vector. And the result of this product is going to be an n-dimensional vector y.

계산을 조금 빨리했지만, 여러분들은 이 행렬과 벡터의 곱셈 과정을 다시 복기하고 이해를 해야 합니다. 여기서 행렬과 벡터의 곱셈이 어떻게 진행되는 지를 다시 한 번 설명합니다. 행렬 A와 벡터 x가 있고 연산 결과인 벡터 y가 있습니다.

행렬 A는 m×n 행렬로 m개의 행(row)과 n개의 열(column)로 구성합니다. 벡터 x는 n X 1 행렬로 n차원 벡터입니다. 행렬 A와 벡터 x를 곱하기 위해서는 반드시 행렬 A의 열의 개수 n과 벡터 x의 행의 개수 n이 일치해야 합니다. 즉, 행렬의 행(row)의 수와 벡터의 열(column)의 수가 반드시 일치해야 합니다. 그래서 행렬 곱셈의 결과는 n차원의 벡터 y입니다.

Rows here. "m" is going to be equal to the number of rows in this matrix "A". So how do you actually compute this vector "y"? Well it turns out to compute this vector "y", the process is to get "yi", multiply "A's" "i'th" row with the elements of the vector "X" and add them up. So here's what I mean. In order to get the first element of "y", that first number--whatever that turns out to be--we're gonna take the first row of the matrix "A" and multiply them one at a time with the elements of this vector "x". So I take this first number multiply it by this first number. Then take the second number multiply it by this second number. Take this third number whatever that is, multiply it the third number and so on until you get to the end. And I'm gonna add up the sults of these products and the result of paying that out is going to give us this first element of "y". Then when we want to get the second element of "y", let's say this element. The way we do that is we take the second row of A and we repeat the whole thing. So we take the second row of A, and multiply it elements-wise, so the elements of x and add up the results of the products and that would give me the second element of y. And you keep going to get and we going to take the third row of A, multiply element Ys with the vector x, sum up the results and then I get the third element and so on, until I get down to the last row like so, okay? So that's the procedure.

벡터 y를 어떻게 계산해야 할까요? m은 행렬 A의 열의 수입니다. 벡터 yi의 값은 행렬 A의 i번째 행과 대응되는 백터 x의 성분을 곱을 한 후 더해 얻어집니다. 예를 들면, 백터 y의 첫 번째 성분인 y1는 행렬 A의 첫 번째 행과 벡터 x의 원소들과 모두 곱을 한 후 더해 얻어집니다.

즉, 벡터 y의 첫 번째 성분 y1은 다음과 같이 계산합니다. 행렬 A의 첫 번째 행과 벡터 x의 성분들을 한 번에 하나씩 곱합니다. 행렬의 첫 행 첫 번째 성분(A11)과 벡터의 첫 성분(x1)을 성분을 곱합니다. 행렬의 첫 행 두 번째 성분(A12)와 벡터의 두 번째 성분(x2)를 곱합니다. 세 번째에 위치한 A13과 x3을 곱합니다. 그리고, 마지막으로 행렬의 첫 행 마지막 성분(A1n)과 벡터의 마지막 성분(xn)을 곱합니다. 그리고 각각의 결과를 모두 더합니다. 이것이 결과 벡터 y의 첫 번째 성분 y1의 값을 구하는 방법입니다.

벡터 y의 두 번째 성분 y2는 다음과 같이 계산합니다. 행렬의 두 번째 행 첫 번째 성분(A21)과 벡터의 첫 성분(x1)을 성분을 곱합니다. 행렬의 두 번째 행 두 번째 성분(A22)와 벡터의 두 번째 성분(x2)를 곱합니다. 그리고, 마지막으로 행렬의 두 번째 행 마지막 성분(A2n)과 벡터의 마지막 성분(xn)을 곱합니다. 그리고 각각의 결과를 모두 더합니다. 이것이 결과 벡터 y의 첫 번째 성분 y2의 값을 구하는 방법입니다.

이 과정을 계속 반복합니다. 벡터 y의 마지막 성분 yn은 다음과 같이 계산합니다. 행렬 A의 마지막 m행의 첫 성분(Am1)과 벡터의 첫 성분(x1)을 성분을 곱합니다. 행렬의 마지막 행 두 번째 성분(Am2)와 벡터의 두 번째 성분(x2)를 곱합니다. 그리고, 마지막으로 행렬의 두 번째 행 마지막 성분(Amn)과 벡터의 마지막 성분(xn)을 곱합니다. 그리고 각각의 결과를 모두 더합니다. 이것이 결과 벡터 y의 마지막 번째 성분 ym의 값을 구하는 방법입니다.

Let's do one more example. Here's the example: So let's look at the dimensions. Here, this is a three by four dimensional matrix. This is a four-dimensional vector, or a 4 x 1 matrix, and so the result of this, the result of this product is going to be a three-dimensional vector. Write, you know, the vector, with room for three elements.

한 가지 예제를 풀어 봅시다. 왼쪽은 3×4 행렬이고, 오른쪽은 4차원 벡터 또는 4 X 1 행렬입니다. 결과 벡터는 3차원 벡터입니다.

Let's do the, let's carry out the products. So for the first element, I'm going to take these four numbers and multiply them with the vector X. So I have 1x1, plus 2x3, plus 1x2, plus 5x1, which is equal to - that's 1+6, plus 2+6, which gives me 14. And then for the second element, I'm going to take this row now and multiply it with this vector (0x1)+3. All right, so 0x1+ 3x3 plus 0x2 plus 4x1, which is equal to, let's see that's 9+4, which is 13. And finally, for the last element, I'm going to take this last row, so I have minus one times one. You have minus two, or really there's a plus next to a two I guess. Times three plus zero times two plus zero times one, and so that's going to be minus one minus six, which is going to make this seven, and so that's vector seven. Okay? So my final answer is this vector fourteen, just to write to that without the colors, fourteen, thirteen, negative seven.And as promised, the result here is a three by one matrix. So that's how you multiply a matrix and a vector.

그러면 계산합시다. 우선 첫 번째 원소를 계산하기 위해 우리는 4개의 수와 벡터 x를 곱합니다.

결과 벡터 y의 첫 번째 성분 y1 = 1 ×1 + 2 × 3 + 1 × 2 + 5 ×1 = 1+6+2+5 = 14입니다. 두 번째 성분 y2 = 0 ×1+ 3 × 3 + 0 × 2 + 4 ×1 = 9+4 = 13입니다. 마지막 성분 y3 = -1 ×1 + (-2) × 3 + 0 × 2 + 0 ×1 = –1 -6 + 0 + 0 = -7입니다. 따라서, 결과 벡터 y = [14; 13; -7]입니다. 예상대로 결과 벡터 y는 3 ×1 행렬입니다. 이것이 행렬과 벡터를 곱하는 방법입니다.

I know that a lot just happened on this slide, so if you're not quite sure where all

these numbers went, you know, feel free to pause the video you know, and so take a slow careful look at this big calculation that we just did and try to make sure that you understand the steps of what just happened to get us these numbers, fourteen, thirteen and eleven.

이번 슬라이드에서 많은 것을 배웠습니다. 행렬 곱셈을 정확히 이해할 수 없다면, 잠시 강의를 멈추고 주의깊게 계산식을 다시 확인하시기 바랍니다. 행렬 곱셈의 결과인 벡터 y의 값을 이해할 필요가 있습니다.

Finally, let me show you a neat trick. Let's say we have a set of four houses so 4 houses with 4 sizes like these. And let's say I have a hypotheses for predicting what is the price of a house, and let's say I want to compute, you know, h of x for each of my 4 houses here. It turns out there's neat way of posing this, applying this hypothesis to all of my houses at the same time. It turns out there's a neat way to pose this as a Matrix Vector multiplication. So, here's how I'm going to do it. I am going to construct a matrix as follows.

마지막으로 간단한 팁을 보여드리겠습니다. 여기 왼쪽에 주택 크기에 관한 4 개의데이터가 있고, 오른쪽에 주택 가격을 예측하는 가설이 있습니다. 가설 함수 h(x)가 모든 주택크기에 대한 주택 가격을 예측하는 가장 적절한 방법이 있습니다. 가설 함수를 활용해 주택 가격을 모두 계산하는 것입니다. 즉, 행렬 벡터 곱셈을 사용하는 것입니다.

My matrix is going to be 1,1,1,1 times, and I'm going to write down the sizes of my four houses here and I'm going to construct a vector as well, And my vector is going to this vector of two elements, that's minus 40 and 0.25. That's these two co-efficients; data 0 and data 1. And what I am going to do is to take matrix and that vector and multiply them together, that times is that multiplication symbol. So what do I get? Well this is a four by two matrix. This is a two by one matrix. So the outcome is going to be a four by one vector, all right. So, let me, so this is going to be a 4 by 1 matrix is the outcome or really a four diminsonal vector, so let me write it as one of my four elements in my four real numbers here. Now it turns out and so

this first element of this result, the way I am going to get that is, I am going to take this and multiply it by the vector.

집의 크기 데이터를 행렬로 만듭니다. 첫 번째 열은 1로 구성하고, 두 번째 열은 주택 크기에 대한 데이터로 채웁니다. 가설 함수 h(x)를 벡터로 디자인합니다. 벡터는 두 개의 성분으로 이루어져 있고, [-40; 0.25]입니다. 가설 hθ(x) = θ0 + θ1x 이므로 벡터의 첫 성분 θ0는 -40이고, 두 번째 성분 θ1은 0.35 입니다. 두 행렬은 곱셈이 가능합니다. 결과는 무엇일까요? 주택 크기를 나타내는 행렬은 4 X 2 차원이고, 가설 벡터는 2 X 1차원이므로 결과 벡터는 4X 1 행렬이자 4차원 벡터입니다. 결과 벡터에 4개의 숫자를 쓸 수 있는 공간을 남겨 둡니다.

And so this is going to be -40 x 1 + 4.25 x 2104. By the way, on the earlier slides I was writing 1 x -40 and 2104 x 0.25, but the order doesn't matter, right? -40 x 1 is the same as 1 x -40. And this first element, of course, is "H" applied to 2104. So it's really the predicted price of my first house. Well, how about the second element? Hope you can see where I am going to get the second element. Right? I'm gonna take this and multiply it by my vector. And so that's gonna be -40 x 1 + 0.25 x 1416. And so this is going be "H" of 1416. Right? And so on for the third and the fourth elements of this 4 x 1 vector. And just there, right? This thing here that I just drew the green box around, that's a real number, OK? That's a single real number, and this thing here that I drew the magenta box around--the purple, magenta color box around--that's a real number, right? And so this thing on the right--this thing on the right overall, this is a 4 by 1 dimensional matrix, was a 4 dimensional vector.

이제 계산을 합니다.

첫 번째 성분 y1 = -40 ×1 + 4.25 ×2104 = 486.0 입니다. 참고로 '1 ×–40' 과 '-40 ×1'은 같기 때문에 어떻게 계산해도 상관없습니다. 첫 번째 성분 y1 = h(2104) = 486.0 입니다. 이것이 우리의 첫 번째 집의 예측 가격입니다.

두 번째 성분을 계산합니다. y2 = -40 ×1 + 0.25 ×1416 = 311.5입니다. 두 번째 성분 y2 = h(1416) = 311.5입니다.

세 번째와 네 번째 성분들도 같은 방법으로 계산합니다. y3 = h(1534) = -40 X1 + 0.25 X 1534 = 343.5 입니다. y4 = h(852) = -40 X 1 + 0.25 X 852 = 173.0 입니다. 결과 벡터 y의 모든 성분은 실수입니다. 결과 벡터 y는 4 ×1 행렬이자 4차원 벡터입니다.

And, the neat thing about this is that when you're actually implementing this in software--so when you have four houses and when you want to use your hypothesis to predict the prices, predict the price "y" of all of these four houses. What this means is that, you know, you can write this in one line of code. When we talk about octave and program languages later, you can actually, you'll actually write this in one line of code. You write prediction equals my, you know, data matrix times parameters, right? Where data matrix is this thing here, and parameters is this thing here, and this times is a matrix vector multiplication. And if you just do this then this variable prediction - sorry for my bad handwriting - then just implement this one line of code assuming you have an appropriate library to do matrix vector multiplication. If you just do this, then prediction becomes this 4 by 1 dimensional vector, on the right, that just gives you all the predicted prices.

소프트웨어 프로그램에서 행렬을 구현하는 과정을 정리합니다. 주택 크기에 대한 4개 데이터와 주택 가격을 예측하는 가설이 있을 때 주택 가격 y를 예측할 수 있습니다. 이 것을 한 줄의 코드로 작성할 수 있습니다. 나중에 옥타브 프로그램이나 프로그램 언어를 이야기할 때 말하겠지만, 확실하게 한 줄의 코드입니다. 프로그래밍 코드는 주택 가격을 예측하기 위해 단지 데이터 행렬과 파라미터 벡터를 곱하는 것뿐입니다.

Prediction = [Data Matrix] X [Parameter Vector]

여기서, 데이터 행렬과 파라미터 벡터의 곱으로 표시합니다. 여기 순서가 중요합니다. 나중에 설명하겠지만, 행렬의 곱셈은 순서가 달라지면 결과가 달라집니다. 프래그래밍 언어에서 제공하는 적절한 라이브러리를 이용하여 행렬 벡터의 곱을 구현합니다. 여기서, 주택 가격 예측은 4 X 1 차원 벡터입니다.

And your alternative to doing this as a matrix vector multiplication would be to write something like , you know, for I equals 1 to 4, right? And you have say a thousand houses it would be for I equals 1 to a thousand or whatever. And then you have to write a prediction, you know, if I equals. and then do a bunch more work over there and it turns out that When you have a large number of houses, if you're trying to predict the prices of not just four but maybe of a thousand houses then it turns out that when you implement this in the computer, implementing it like this, in any of the various languages. This is not only true for Octave, but for Supra Server Java or Python, other high-level, other languages as well. It turns out, that, by writing code in this style on the left, it allows you to not only simplify the code, because, now, you're just writing one line of code rather than the form of a bunch of things inside. But, for subtle reasons, that we will see later, it turns out to be much more computationally efficient to make predictions on all of the prices of all of your houses doing it the way on the left than the way on the right than if you were to write your own formula. I'll say more about this later when we talk about vectorization, but, so, by posing a prediction this way, you get not only a simpler piece of code, but a more efficient one.

그리고 행렬 벡터 곱셈 대신에 프로그래밍 언어에서 제공하는 For 루프를 사용할 수 있습니다. 위의 예제를 i=1 에서 4까지 반복하는 것입니다. For 루프는 가설 함수 prediction(i) 를 계산을 반복합니다. 만일 천 개의 주택 가격을 계산한다면, For 루프는 i = 1에서 1,000까지 반복합니다. 훨씬 많은 주택에 대한 주택가격을 예측한다면 더 많은 반복을 해야 합니다.

두 가지 방법 모두 컴퓨터에서 옥타브 프로그램, 자바와 파이썬 같은 다양한 하이레벨 언어로 작성할 수 있습니다. 하지만, "Prediciton = Data Matrix * Parameters"와 같은 한줄의 코드로 작성하는 방법이 For 루프를 쓰는 것보다 계산적으로 훨씬 효율적입니다. 왜냐하면 여러 줄의 복잡한 코드를 계산하는 것보다 한 줄의 코드를 계산한느 것이 훨씬 더 효율적이기 때문입니다. 나중에 다룰 벡터화 구현 (vectorization)에서 자세히 다룰 것입니다. 벡터화 구현은 더 간단하고 더 효율적인 방법입니다.

So, that's it for matrix vector multiplication and we'll make good use of these sorts of operations as we develop the living regression in other models further. But, in the next video we're going to take this and generalize this to the case of matrix matrix multiplication.

이것이 행렬과 벡터의 곱셈입니다. 행렬과 벡터 곱셈을 선형 회귀를 개발할 때 활용할 것입니다. 다음 강의에서 행렬과 행렬의 곱셈을 다룰 것입니다.