brunch

You can make anything
by writing

C.S.Lewis

by 라인하트 Dec 12. 2020

앤드류 응의 머신러닝(16-5): 저 차원 행렬 분해

온라인 강의 플랫폼 코세라의 창립자인 앤드류 응 (Andrew Ng) 교수는 인공지능 업계의 거장입니다. 그가 스탠퍼드 대학에서 머신 러닝 입문자에게 한 강의를 그대로 코세라 온라인 강의 (Coursera.org)에서 무료로 배울 수 있습니다. 이 강의는 머신러닝 입문자들의 필수코스입니다. 인공지능과 머신러닝을 혼자 공부하면서 자연스럽게 만나게 되는 강의입니다.

Recommender Systems

(추천 시스템)

Low Rank Matrix Factorization (저 차원 행렬 분해)

Vectorization: Low Rank Matrix Factorization

(벡터화 구현: 저 차원 행렬 분해)

In the last few videos, we talked about a collaborative filtering algorithm. In this video I'm going to say a little bit about the vectorization implementation of this algorithm. And also talk a little bit about other things you can do with this algorithm. For example, one of the things you can do is, given one product can you find other products that are related to this so that for example, a user has recently been looking at one product. Are there other related products that you could recommend to this user? So let's see what we could do about that. What I'd like to do is work out an alternative way of writing out the predictions of the collaborative filtering algorithm.

지난 강의에서 협업 필터링 알고리즘을 설명했습니다. 이번 강의에서 알고리즘의 벡터화 구현을 설명합니다. 또한, 협업 필터링 알고리즘으로 무엇을 할 수 있는 지도 설명합니다. 예를 들면, 사용자가 최근 관심을 가지는 제품과 관련된 다른 제품을 찾을 수 있습니다. 사용자에게 추천할 수 있는 다른 제품은 무엇일까요? 협업 필터링 알고리즘이 예측하는 또 다른 방법을 정리합니다.

To start, here is our data set with our five movies and what I'm going to do is take all the ratings by all the users and group them into a matrix. So, here we have five movies and four users, and so this matrix y is going to be a 5 by 4 matrix. It's just you know, taking all of the elements, all of this data. Including question marks, and grouping them into this matrix. And of course the elements of this matrix of the (i, j) element of this matrix is really what we were previously writing as y superscript i, j. It's the rating given to movie i by user j.

여기 5 개의 영화에 대한 데이터 셋이 있습니다. 모든 사용자의 모든 영화에 대한 평가 등급을 가져와 매트릭스로 그룹화합니다. 여기 5개의 영화와 4명의 사용자가 있습니다. 이 행렬 Y는 5 X 4 행렬로 표현합니다. 행렬의 모든 성분은 이미 알고 있지만 물음표(?)로 표시된 값을 그대로 그룹화합니다. 이 행렬의 성분 (i, j)는 지난 강의에서 y^(i, j)로 표시했던 것과 동일합니다. 사용자 j가 영화 i를 평가한 등급입니다.

Given this matrix y of all the ratings that we have, there's an alternative way of writing out all the predictive ratings of the algorithm. And, in particular if you look at what a certain user predicts on a certain movie, what user j predicts on movie i is given by this formula. And so, if you have a matrix of the predicted ratings, what you would have is the following matrix where the i, j entry. So this corresponds to the rating that we predict using j will give to movie i is exactly equal to that theta j transpose XI, and so, you know, this is a matrix where this first element the one-one element is a predictive rating of user one or movie one and this element, this is the one-two element is the predicted rating of user two on movie one, and so on, and this is the predicted rating of user one on the last movie

여기 있는 모든 영화의 등급의 행렬 Y가 있을 때 알고리즘이 모든 영화에 대한 등급을 예측하는 다른 방법이 있습니다. 특정 사용자가 특정 영화를 어떻게 예측하는 지를 살펴봅시다. 사용자 j가 영화 i에서 무엇을 예측하는지에 대한 공식은 θ^(j)^T(x^(i))입니다. 따라서, 첫 번째 요소인 파란색 박스는 (1,1)은 사용자 1과 영화 1에 대한 성분입니다. 그다음 파란색 박스 (2,1)은 사용자 2와 영화 1에 대한 성분입니다. 즉, 영화 1에 대한 사용자 2의 평가 예측입니다. 그리고 마지막 파란색 박스 (1, x^(nm))은 마지막 영화에 대한 사용자 1의 평가 예측입니다.

And if you want, you know, this rating is what we would have predicted for this value and this rating is what we would have predicted for that value, and so on.

행렬 Y의 분홍색 동그라미 값 5는 θ^(1)^T(x^(1))을 예측했을 때의 값입니다. 녹색 동그라미의 물음표(?)는 θ^(2)^T(x^(2))을 예측했을 때 값입니다.

Now, given this matrix of predictive ratings there is then a simpler or vectorized way of writing these out. In particular if I define the matrix x, and this is going to be just like the matrix we had earlier for linear regression to be sort of x1 transpose x2 transpose down to x of nm transpose. So I'm take all the features

for my movies and stack them in rows. So if you think of each movie as one example and stack all of the features of the different movies and rows. And if we also to find a matrix capital theta, and what I'm going to do is take each of the per user parameter vectors, and stack them in rows, like so. So that's theta 1, which is the parameter vector for the first user. And, you know, theta 2, and so, you must stack them in rows like this to define a matrix capital theta and so I have nu parameter vectors all stacked in rows like this. Now given this definition for the matrix x and this definition for the matrix theta in order to have a vectorized way of computing the matrix of all the predictions you can just compute x times the matrix theta transpose, and that gives you a vectorized way of computing this matrix over here. To give the collaborative filtering algorithm that you've been using another name. The algorithm that we're using is also called low rank matrix factorization.

예측 평가 행렬이 있을 때 더 간단한 벡터화된 방법이 있습니다. 행렬 X를 정의할 때 선형 회귀에 대해 전에 있던 행렬과 동일합니다. 영화에 대한 모든 피처를 행에 나열합니다. 첫 번째 영화에 대한 피처 벡터를 X^(1) = [ x1^(1); x2^(1);...; xn^(1)] 전치하여 (x^(1))^T = [ x1^(1), x2^(1),..., xn^(1)] 행렬의 행으로 표현합니다. 행렬 X의 마지막 행은 마지막 영화 nm에 대한 모든 피처를 나열한 (x^(nm)^T입니다.

X = [ x1^(1), x2^(1),..., xn^(1)]

[ x1^(2), x2^(2),..., xn^(2)]

[ x1^(nm), x2^(nm),..., xn^(nm)]

Θ도 마찬가지입니다. 행렬 Θ의 마지막 행은 마지막 사용자 nu에 대한 모든 피처를 나열한 (θ^(nu))^T입니다.

Θ = [ θ1^(1), θ2^(1),..., θn^(1)]

[ θ1^(2), θ2^(2),..., θn^(2)]

[ θ1^(nu), θ2^(nu),..., θn^(nu)]

따라서, 행렬 X와 행렬 Θ 가 이렇게 생성될 때 모든 예측을 벡터화 구현으로 계산할 수 있습니다. 계산 공식은 다음과 같습니다.

XΘ^T

여기 협업 필터링 알고리즘이 있습니다. 이렇게 계산하는 것을 저 차원 행렬 분해라고 합니다.

And so if you hear people talk about low rank matrix factorization that's essentially exactly the algorithm that we have been talking about. And this term comes from the property that this matrix x times theta transpose has a mathematical property in linear algebra called that this is a low rank matrix and so that's what gives rise to this name low rank matrix factorization for these algorithms, because of this low rank property of this matrix x theta transpose. In case you don't know what low rank means or in case you don't know what a low rank matrix is, don't worry about it. You really don't need to know that in order to use this algorithm. But if you're an expert in linear algebra, that's what gives this algorithm, this other name of low rank matrix factorization.

그래서, 저 차원 행렬 분해는 본질적으로 지금까지 다룬 알고리즘과 정확히 일치합니다. XΘ^T는 선형대수학에서 저 차원 행렬이라는 수학적 속성이 있고, 저 차원 행렬 분해라고 합니다. 여기서 저 차원의 의미와 저 차원 행렬의 의미가 무엇인지 몰라도 상관없습니다. 이 알고리즘을 사용하기 위해 실제포 알 필요가 없습니다.

Finally, having run the collaborative filtering algorithm here's something else that you can do which is use the learned features in order to find related movies. Specifically for each product i really for each movie i, we've learned a feature vector xi. So, you know, when you learn a certain features without really know that can the advance what the different features are going to be, but if you run the algorithm and perfectly the features will tend to capture what are the important aspects of these different movies or different products or what have you. What are the important aspects that cause some users to like certain movies and cause some users to like different sets of movies. So maybe you end up learning a feature, you know, where x1 equals romance, x2 equals action similar to an earlier video and maybe you learned a different feature x3 which is a degree to which this is a comedy. Then some feature x4 which is, you know, some other thing. And you have N features all together and after you have learned features it's actually often pretty difficult to go in to the learned features and come up with a human understandable interpretation of what these features really are. But in practice, you know, the features even though these features can be hard to visualize. It can be hard to figure out just what these features are. Usually, it will learn features that are very meaningful for capturing whatever are the most important or the most salient properties of a movie that causes you to like or dislike it

마지막으로 여기서 협업 필터링 알고리즘이 관련된 영화를 찾기 위해 학습한 피처를 사용할 수 있습니다. 각 제품 i 즉, 각 영화 i에 대한 피처 벡터 x^(i)를 학습했습니다. 피처가 정확히 무엇인지 알지도 못하고 특정 피처를 학습할 때도 영화나 제품의 중요한 특징들을 포착합니다. 어떤 사용자가 특정 영화를 좋아하고, 또 어떤 사용자들은 다른 영화를 좋아하는 요소는 무엇일까요? 피처를 학습합니다. x1은 로맨스, x2는 액션, x3는 코미디, x4는 다른 무엇입니다. 그리고 n개의 피처를 가지고 피처를 학습합니다. 실제로 피처가 무엇인지 시각화하거나 해석을 하는 것은 매우 어렵습니다. 사용자들이 영화를 좋아하거나 싫어하게 만드는 영화의 가장 중요하고 두드러진 특성을 파악할 수 있는 피처를 학습니다.

And so now let's say we want to address the following problem. Say you have some specific movie i and you want to find other movies j that are related to that movie. And so well, why would you want to do this? Right, maybe you have a user that's browsing movies, and they're currently watching movie j, than what's a reasonable movie to recommend to them to watch after they're done with movie j? Or if someone's recently purchased movie j, well, what's a different movie that would be reasonable to recommend to them for them to consider purchasing.

이제 다음 문제를 해결해 봅시다. 특정 영화 i와 관련된 다른 영화를 찾는 문제입니다. 왜 이런 문제가 필요할까요? 맞습니다. 어떤 사용자가 현재 영화 j를 보고 있습니다. 영화 j를 시청한 사용자에게 어떤 영화를 추천해야 할까요? 또 누군가 영화 j를 구매했다면 그에게 어떤 영화를 추천해야 할까요?

So, now that you have learned these feature vectors, this gives us a very convenient way to measure how similar two movies are. In particular, movie i has a feature vector xi and so if you can find a different movie, j, so that the distance between xi and xj is small, then this is a pretty strong indication that, you know, movies j and i are somehow similar. At least in the sense that some of them likes movie i, maybe more likely to like movie j as well. So, just to recap, if your user is looking at some movie i and if you want to find the 5 most similar movies to that movie in order to recommend 5 new movies to them, what you do is find the five

movies j, with the smallest distance between the features between these different movies. And this could give you a few different movies to recommend to your user.

여기 피처 벡터가 있으므로 두영화가 얼마나 유사한지를 측정하는 매우 편리한 방법이 있습니다. 영화 i에 대한 피처 벡터 x^(i)가 있고, 다른 영화 j에 대한 피처 벡터 x^(i)가 있습니다. 따라서, 두 영화가 비슷하다는 것은 차이가 적다는 의미입니다. ||x^(i) - y^(i)|| 의 차이가 적다는 것은 영화 i를 좋아하므로 영화 j를 좋아할 가능성이 더 높습니다. 요약하자면, 사용자가 어떤 영화 i를 보고 있고 5 개의 새로운 영화를 추천하기 위해 해당 영화와 가장 유사한 영화 5개를 찾습니다. 유사한 영화는 영화 i와의 거리가 가장 짧은 영화입니다. 이것은 사용자에게 몇 가지 다른 영화를 추천할 수 있습니다.

So with that, hopefully, you now know how to use a vectorized implementation to compute all the predicted ratings of all the users and all the movies, and also how to do things like use learned features to find what might be movies and what might be products that aren't related to each other.

이제 벡터화 구현을 사용하여 모든 사용자와 모든 영화의 모든 예측된 등급을 계산하는 방법과 학습한 피처를 사용하여 서로 관련이 있는 제품과 서로 관련이 없는 제품을 찾는 법을 배웠습니다.