brunch

You can make anything
by writing

C.S.Lewis

by 라인하트 Dec 10. 2020

앤드류 응의 머신러닝(16-3): 협업 필터링

온라인 강의 플랫폼 코세라의 창립자인 앤드류 응 (Andrew Ng) 교수는 인공지능 업계의 거장입니다. 그가 스탠퍼드 대학에서 머신 러닝 입문자에게 한 강의를 그대로 코세라 온라인 강의 (Coursera.org)에서 무료로 배울 수 있습니다. 이 강의는 머신러닝 입문자들의 필수코스입니다. 인공지능과 머신러닝을 혼자 공부하면서 자연스럽게 만나게 되는 강의입니다.

Recommender Systems

(추천 시스템)

Collaborative Filtering (협업 필터링)

In this video we'll talk about an approach to building a recommender system that's called collaborative filtering. The algorithm that we're talking about has a very interesting property that it does what is called feature learning and by that I mean that this will be an algorithm that can start to learn for itself what features to use.

이번 강의에서 설명하는 추천 시스템은 협업 필터링 알고리즘 (Collaborative Filtering)입니다. 협업 필터링 알고리즘은 피처를 자동으로 학습합니다. 알고리즘은 어떤 피처를 사용할지를 스스로 학습합니다.

Here was the data set that we had and we had assumed that for each movie, someone had come and told us how romantic that movie was and how much action there was in that movie. But as you can imagine it can be very difficult and time consuming and expensive to actually try to get someone to, you know, watch each movie and tell you how romantic each movie and how action packed is each movie, and often you'll want even more features than just these two. So where do you get these features from?

So let's change the problem a bit and suppose that we have a data set where we do not know the values of these features. So we're given the data set of movies and of how the users rated them, but we have no idea how romantic each movie is and we have no idea how action packed each movie is so I've replaced all of these things with question marks. But now let's make a slightly different assumption.

Let's say we've gone to each of our users, and each of our users has told has told us how much they like the romantic movies and how much they like action packed movies. So Alice has associated a current of theta 1. Bob theta 2. Carol theta 3. Dave theta 4. And let's say we also use this and that Alice tells us that she really likes romantic movies and so there's a five there which is the multiplier associated with X1 and lets say that Alice tells us she really doesn't like action movies and so there's a 0 there. And Bob tells us something similar so we have theta 2 over here. Whereas Carol tells us that she really likes action movies which is why there's a 5 there, that's the multiplier associated with X2, and remember there's also x0 equals 1 and let's say that Carol tells us she doesn't like romantic movies and so on, similarly for Dave.

여기 데이터 셋이 있습니다. 지금까지 누군가가 각 영화에 대해 로맨틱 정도와 액션 정도를 알려준다고 가정했습니다. 그러나 여러분들이 아시다시피 누군가가 영화를 분석한다는 것은 매우 어렵고 시간이 많이 걸리고 비용이 많이 드는 작업입니다. 하지만, 누군가 영화를 보고 로맨틱 척도와 액션 척도를 말해주길 원합니다. 그렇다면 영화의 특성을 표시하는 피처는 어디서 얻을 수 있을까요?

문제를 약간 변형합니다. 이제는 영화의 특성을 나타내는 피처의 값을 모릅니다. 영화의 데이터 셋과 사용자가 영화를 평가한 등급에 대한 데이터가 있지만, 영화의 로맨틱 척도 x1와 액션 척도 x2를 모릅니다. 하지만, 사용자가 좋아하는 로맨틱 영화 척도와 액션 영화 척도에 대한 정보가 있습니다. 앨리스는 θ^(1), 밥은 θ^(2), 캐럴은 θ^(3), 데이브는 θ^(4)입니다. 앨리스는 로맨틱 척도 x1은 5점을 주지만, 액션 척도 x2는 0점을 줍니다. 밥은 앨리스와 취향이 비슷합니다. 캐럴은 액션 척도 x2에 5점을 주고, 로맨틱 척도는 0을 줍니다. 데이브도 캐럴과 취향이 비슷합니다. x0는 항상 1입니다.

So let's assume that somehow we can go to users and each user J just tells us what is the value of theta J for them. And so basically specifies to us of how much they like different types of movies. If we can get these parameters theta from our users then it turns out that it becomes possible to try to infer what are the values of x1 and x2 for each movie.

Let's look at an example. Let's look at movie 1. So that movie 1 has associated with it a feature vector x1. And you know this movie is called Love at last but let's ignore that. Let's pretend we don't know what this movie is about, so let's ignore the title of this movie. All we know is that Alice loved this move. Bob loved this movie. Carol and Dave hated this movie. So what can we infer? Well, we know from the feature vectors that Alice and Bob love romantic movies because they told us that there's a 5 here. Whereas Carol and Dave, we know that they hate romantic movies and that they love action movies. So because those are the parameter vectors that you know, uses 3 and 4, Carol and Dave, gave us. And so based on the fact that movie 1 is loved by Alice and Bob and hated by Carol and Dave, we might reasonably conclude that this is probably a romantic movie, it is probably not much of an action movie. this example is a little bit mathematically simplified but what we're really asking is what feature vector should X1 be so that theta 1 transpose x1 is approximately equal to 5, that's Alice's rating, and theta 2 transpose x1 is also approximately equal to 5, and theta 3 transpose x1 is approximately equal to 0, so this would be Carol's rating, and theta 4 transpose X1 is approximately equal to 0. And from this it looks like, you know, X1 equals one that's the intercept term, and then 1.0, 0.0, that makes sense given what we know of Alice, Bob, Carol, and Dave's preferences for movies and the way they rated this movie. And so more generally, we can go down this list and try to figure out what might be reasonable features for these other movies as well.

사용자 j는 자신의 영화 취향에 대한 θ^(j)의 값을 설정합니다. 사용자들은 어떤 영화를 얼마나 좋아하는 지를 직접 설정합니다. 사용자마다 영화 취향에 대한 θ^(j)가 있다면, 영화에 대한 피처 x1과 x2를 추정할 수 있습니다.

예를 들면, 여러분은 첫 번째 영화의 특징을 알지 못하지만, 첫 번째 영화를 앨리스와 밥이 무척 좋아한다는 것과 캐럴과 데이브가 무척 싫어한다는 것을 압니다. 무엇을 추론할 수 있을까요? 앨리스와 밥은 로맨틱 영화 피처 x1에 5점을 주었기 때문에 로맨틱 영화를 좋아한다는 것을 추론할 수 있습니다. 캐럴과 데이브는 로맨틱 영화 피처 x2에 0점을 주었기 때문에 로맨틱 영화를 싫어한다는 것을 추론할 수 있습니다. 이것이 여러분이 알고 있는 파라미터 벡터입니다. 첫 번째 영화는 앨리스와 밥이 좋아하고 캐럴과 데이브가 싫어한다는 사실을 바탕으로 첫 번째 영화를 액션 영화가 아닌 로맨틱 영화라고 합리적으로 추론할 수 있습니다.

이 예제는 수학적으로 단순화되었습니다. 각 사용자가 첫 번째 영화에 대한 선호도를 다음과 같이 추정할 수 있습니다.

여기서, x^(1) = [1; 1.0; 0.0]입니다. 앨리스, 밥, 캐럴, 데이브의 영화 취향과 평가 등급을 기준으로 영화의 피처 벡터를 계산합니다. 그리고, 영화 목록을 따라 내려가면서 다른 영화들에 대해서도 피처 x1과 x2의 값을 계산할 수 있습니다.

Let's formalize this problem of learning the features XI. Let's say that our users have given us their preferences. So let's say that our users have come and, you know, told us these values for theta 1 through theta of NU and we want to learn the feature vector XI for movie number I. What we can do is therefore pose the following optimization problem. So we want to sum over all the indices J for which we have a rating for movie I because we're trying to learn the features for movie I that is this feature vector XI. So and then what we want to do is minimize this squared error, so we want to choose features XI, so that, you know, the predictive value of how user J rates movie I will be similar, will be not too far in the squared error sense of the actual value YIJ that we actually observe in the rating of user j on movie I. So, just to summarize what this term does is it tries to choose features XI so that for all the users J that have rated that movie, the algorithm also predicts a value for how that user would have rated that movie that is not too far, in the squared error sense, from the actual value that the user had rated that movie.As usual, we can also add this sort of regularization term to prevent the features from becoming too big. So that's the squared error term.

피처 xi를 학습하는 문제를 수학 공식으로 정리합니다. 먼저 사용자가 자신의 취향 또는 선호도를 제공한 것으로 가정합니다. 사용자 1의 선호도 θ^(1), 사용자 2의 선호도 θ^(2),..., 사용자 nu의 선호도 θ^(nu)를 확보하였습니다. 여기서 i번째 영화의 피처 벡터 x^(i)를 학습할 것입니다. 따라서, 다음과 같이 최적화 문제로 정리합니다.

r(i, j) = 1은 i 번째 영화에 대한 등급이 존재하는 지를 표시합니다. 따라서, 영화 등급을 표시한 사용자 j의 등급 값을 모두 합산합니다. 왜냐하면 i 번째 영화의 피처 벡터를 학습하려고 하기 때문입니다. 따라서, 오차의 제곱 함수를 최소화합니다. 사용자 j가 영화를 평가할 것으로 예측한 값 (θ^(j))^T(x^(i))과 사용자 j가 i 번째 영화를 실제로 평가한 값 (y^(i, j)) 사이의 오차를 제곱한 값은 크지 않을 것입니다. 최소화된 값을 찾기 때문입니다. i 번째 영화를 평가한 사용자 j에 대한 피처 벡터 x^(i)를 결정할 것입니다. 알고리즘은 사용자들이 i 번째 영화를 실제로 평가한 값과 오차가 크지 않은 값을 제곱한 것으로 평가한 값을 예측합니다. 이것이 오차의 제곱 항의 역할입니다. 정규화 항은 일반적으로 피처가 너무 큰 값을 가지는 것을 방지하기 위해 추가합니다.

So this is how we would learn the features for one specific movie but what we want to do is learn all the features for all the movies and so what I'm going to do is add this extra summation here so I'm going to sum over all Nm movies, N subscript m movies, and minimize this objective on top that sums of all movies. And if you do that, you end up with the following optimization problem. And if you minimize this, you have hopefully a reasonable set of features for all of your movies. So putting everything together, what we, the algorithm we talked about in the previous video and the algorithm that we just talked about in this video.

이 수학 공식은 특정 영화의 피처를 배우는 방법이고, 여기서 필요한 것은 모든 영화의 피처를 학습하는 것입니다. 모든 nm 개의 영화와 모든 영화의 피처에 대한 합계가 최소화해야 합니다. 최소화하기 위해서는 모든 영화가 합리적인 피처 셋을 가져야 합니다.

모든 것을 합치면, 지난 강의와 이번 강의에서 이야기한 알고리즘입니다.

In the previous video, what we showed was that you know, if you have a set of movie ratings, so if you have the data the rij's and then you have the yij's that will be the movie ratings. Then given features for your different movies we can learn these parameters theta. So if you knew the features, you can learn the parameters theta for your different users. And what we showed earlier in this video is that if your users are willing to give you parameters, then you can estimate features for the different movies. So this is kind of a chicken and egg problem. Which comes first? You know, do we want if we can get the thetas, we can know the Xs. If we have the Xs, we can learn the thetas. And what you can do is, and then this actually works

이전 강의에서 설명한 콘텐츠 기반 추천 알고리즘은 모든 영화에 대한 피처 벡터 x^(i)가 있고, r(i, j)=1 일 때 사용자 j가 평가한 영화 등급 y^(i, j)가 있을 때 사용합니다. 사용자 j의 영화 취향이나 선호도를 나타내는 파라미터 θ^(j) 학습할 수 있습니다.

이번 강의에서 협업 필터링 알고리즘은 사용자 j가 영화에 대한 취향이나 선호를 나타내는 파라미터 θ^(j)를 제공할 때 사용합니다. 영화의 특징을 나타내는 피처 벡터 x^(i)를 학습할 수 있습니다.

이것은 일종의 닭이 먼저냐 달걀이 먼저냐의 문제입니다. 사용자의 취향 θ^(j)를 얻을 수 있다면 영화의 피처 x^(i)를 추정할 수 있습니다. 영화의 피처 x^(i)를 얻을 수 있다면 사용자의 취향 θ^(j)를 추정할 수 있습니다. 실제로 계산이 가능합니다.

What you can do is in fact randomly guess some value of the thetas. Now based on your initial random guess for the thetas, you can then go ahead and use the procedure that we just talked about in order to learn features for your different movies. Now given some initial set of features for your movies you can then use this first method that we talked about in the previous video to try to get an even better estimate for your parameters theta. Now that you have a better setting of the parameters theta for your users, we can use that to maybe even get a better set of features and so on. We can sort of keep iterating, going back and forth and optimizing theta, x theta, x theta, nd this actually works and if you do this, this will actually cause your album to converge to a reasonable set of features for you movies and a reasonable set of parameters for your different users.

사실 사용자의 취향 파라미터 θ의 값을 무작위로 추측합니다. 그리고 무작위로 추측한 사용자의 취향 파라미터 θ를 바탕으로 영화의 피처 x를 학습합니다. 영화에 대한 몇 가지 초기 피처 셋이 있다면 이전 강의의 첫 번째 방법을 사용하여 파라미터 벡터 θ에 대한 더 나은 추정치를 얻을 수 있습니다. 이제 사용자들의 영화에 대한 취향을 설명하는 파라미터 벡터 θ를 얻었으므로 다시 영화에 대한 피처 벡터 x를 추정합니다. 이과정을 반복하면 실제로 매우 효과적인 영화에 대한 합리적인 피처 셋으로 수렴합니다.

So this is a basic collaborative filtering algorithm. This isn't actually the final algorithm that we're going to use. In the next video we are going to be able to improve on this algorithm and make it quite a bit more computationally efficient. But, hopefully this gives you a sense of how you can formulate a problem where you can simultaneously learn the parameters and simultaneously learn the features from the different movies. And for this problem, for the recommender system problem, this is possible only because each user rates multiple movies and hopefully each movie is rated by multiple users. And so you can do this back and forth process to estimate theta and x.

이것이 기본적인 협업 필터링 알고리즘입니다. 이것이 실제로 사용하는 최종 단계의 알고리즘은 아닙니다. 다음 강의에서 이 알고리즘을 개선하여 더 효율적으로 만들 것입니다. 그러나 이것이 파라미터를 학습하고 동시에 영화 피처를 학습하는 것에 대한 감각을 익혔기를 바랍니다. 이런 형태의 추천 시스템은 각 사용자가 여러 영화를 평가하고, 각 영화를 여러 사용자가 평가하기 때문에 가능합니다.θ와 x를 추정하기 위해 반복하는 것입니다.

So to summarize, in this video we've seen an initial collaborative filtering algorithm. The term collaborative filtering refers to the observation that when you run this algorithm with a large set of users, what all of these users are effectively doing are sort of collaboratively--or collaborating to get better movie ratings for everyone because with every user rating some subset with the movies, every user is helping the algorithm a little bit to learn better features, and then by helping-- by rating a few movies myself, I will be helping the system learn better features and then these features can be used by the system to make better movie predictions for everyone else. And so there is a sense of collaboration where every user is helping the system learn better features for the common good. This is this collaborative filtering.

요약하자면 이번 강의에서 설명한 추천 시스템은 초기 협업 필터링 알고리즘입니다. 협업 필터링은 대규모 사용자가 알고리즘을 실행하는 일종의 공동 작업이라는 의미입니다. 모든 사용자가 더 나은 피처를 학습하기 위해 알고리즘을 조그씩 돕는 것입니다. 그러고 나서 몇 편의 영화를 직접 평가하여 시스템이 더 나은 피처를 학습하도록 돕고 이런 피처가 다른 사람들이 더 나은 영화를 찾을 수 있도록 하는 시스템입니다. 따라서 모든 사용자가 시스템이 공익을 위해 더 나은 피처를 학습하는 협업의 느낌입니다. 이것이 협업 필터링입니다.

And, in the next video what we going to do is take the ideas that have worked out, and try to develop a better an even better algorithm, a slightly better technique for collaborative filtering.

다음 강의에서 협업 필터링 알고리즘을 더 나은 기술을 적용하여 더 나은 알고리즘으로 개발할 것입니다.

앤드류 응의 머신러닝 동영상 강의

정리하며

영화 추천 시스템은 사용자들이 아직 평가하지 않은 영화들을 어떻게 평가할지를 예측하는 것입니다. 그래서, 추천 영화에 대한 시청률과 구매율을 높이는 것이 목적입니다.

첫 번째로 콘텐츠 기반 추천 알고리즘을 다룹니다. 콘텐츠 기반 필터링은 영화 또는 팝송과 같은 콘텐츠를 분석한 콘텐츠 프로파일(x^(i)과 사용자 선호도를 분석한 사용자 프로파일 (θ^(nu))을 구축한 후 사용자 선호에 맞는 콘텐츠를 추천합니다. 예를 들면, 음악 사이트 판도라는 신곡이 출시될 때마다 장르, 비트, 음색 등 400여 개 항목의 피처를 추출합니다. 사용자 파일과 신곡 콘텐츠 프로파일을 비교하여 좋아할 만한 사용자에게 추천합니다.

두 번째로 협업 필터링 알고리즘을 다룹니다. 협업 필터링은 대규모 사용자 행동 정보를 분석하여 사용자와 비슷한 선호도를 가진 사용자들이 좋아했던 항목을 추천합니다. 예를 들면, 온라인 쇼핑 사이트에서 '이 상품을 구매한 사용자가 구매한 상품들'을 보여줍니다. 라면을 구입한 사용자들이 생수를 같이 구매한다면, 생수를 추천합니다. 협업 필터링은 콘텐츠 프로파일 (x^(i)을 분석할 필요 없다는 것이 장점입니다.

협업 필터링 알고리즘은 다음과 같이 동작합니다.

1) 사용자의 선호도를 나타내는 사용자 프로파일 생성합니다.

누군가 또는 어떤 프로그램이 각 사용자 j에 대한 파라미터 벡터 θ^(j)를 학습한다고 가정합니다. 예를 들면, 첫 번째 사용자 앨리스 θ^(1) = [0; 5; 0]입니다. 영화에 대한 프로파일과 동일한 이 예제에서 θ^(j)는 R^(3) 차원 벡터입니다.

2) 영화에 대해 사용자가 평가한 데이터를 생성합니다.

r(i, j) = 1은 사용자 j가 영화 i에 대해 평가한 영화입니다. 사용자 j가 영화 i에 실제로 평가한 선호도 또는 등급은 y(i, j)입니다.

3) 각 영화의 특성에 대한 프로파일을 예측합니다.

따라서, r(i, j) = 0인 영화에 대해 사용자 j의 선호를 측정하기 위해 다음과 같은 공식을 사용합니다. x1은 로맨틱 영화의 정도를 측정하고, x2는 액션 영화의 정도를 측정합니다

파라미터 벡터 θ^(j)와 피처 벡터 x^(i) 사이의 내적으로 사용자 j의 영화 i에 대한 등급을 예측합니다.

영화의 특성을 나타내는 영화 프로파일 생성하기

여기서, 사용자의 선호도를 나타내는 프로파일과 사용자가 직접 평가한 영화 등급은 있지만, 영화의 특성에 대한 프로파일이 없을 때 영화 프로파일 x^(i)를 예측합니다. 파라미터 벡터 x^(i)를 학습하는 과정은 선형 회귀 문제입니다. 파라미터 벡터 x^(i)를 예측한 값은 사용자 실제 평가한 값과 가능한 한 가까워야 합니다. 이것을 수학 공식으로 정리합니다.

여기서, 사용자의 취향 θ^(j)를 얻을 수 있다면 영화의 피처 x^(i)를 추정할 수 있습니다. 영화의 피처 x^(i)를 얻을 수 있다면 사용자의 취향 θ^(j)를 추정할 수 있습니다.θ와 x를 추정하기 위해 반복하다 보면 가장 합리적인 피처 셋을 구할 수 있습니다.

협업 필터링은 대규모 사용자가 알고리즘을 실행하는 일종의 공동 작업이라는 의미입니다. 모든 사용자가 더 나은 피처를 학습하기 위해 알고리즘을 조그씩 돕는 것입니다. 그러고 나서 몇 편의 영화를 직접 평가하여 시스템이 더 나은 피처를 학습하도록 돕고 이런 피처가 다른 사람들이 더 나은 영화를 찾을 수 있도록 하는 시스템입니다. 따라서 모든 사용자가 시스템이 공익을 위해 더 나은 피처를 학습하는 협업의 느낌입니다.