brunch

You can make anything
by writing

C.S.Lewis

by 라인하트 Dec 17. 2020

앤드류 응의 머신러닝(17-5): 온라인 학습 알고리즘

온라인 강의 플랫폼 코세라의 창립자인 앤드류 응 (Andrew Ng) 교수는 인공지능 업계의 거장입니다. 그가 스탠퍼드 대학에서 머신 러닝 입문자에게 한 강의를 그대로 코세라 온라인 강의 (Coursera.org)에서 무료로 배울 수 있습니다. 이 강의는 머신러닝 입문자들의 필수코스입니다. 인공지능과 머신러닝을 혼자 공부하면서 자연스럽게 만나게 되는 강의입니다.

Large Scale Machine Learning

(대규모 머신러닝)

Advanced Topic (고급 주제)

Online Learning (온라인 학습)

In this video, I'd like to talk about a new large-scale machine learning setting called the online learning setting. The online learning setting allows us to model problems where we have a continuous flood or a continuous stream of data coming in and we would like an algorithm to learn from that. Today, many of the largest websites, or many of the largest website companies use different versions of online learning algorithms to learn from the flood of users that keep on coming to, back to the website. Specifically, if you have a continuous stream of data generated by a continuous stream of users coming to your website, what you can do is sometimes use an online learning algorithm to learn user preferences from the stream of data and use that to optimize some of the decisions on your website.

이번 강의에서는 온라인 학습 알고리즘이라 불리는 새로운 대규모 머신 러닝 알고리즘에 대해 설명합니다. 온라인 학습 알고리즘은 실시간으로 유입되는 데이터를 모델링하고 학습합니다. 오늘날 많은 대규모 웹사이트에서 사용하는 여러 버전의 온라인 학습 알고리즘은 웹사이트를 방문하는 수많은 사용자들로부터 실시간으로 학습합니다. 대부분의 웹사이트는 연속적이고 지속적으로 방문하는 사용자들이 데이터를 실시간으로 생성하면서 데이터 스트림이 발생합니다. 온라인 학습 알고리즘은 데이터 스트림에서 사용자 선호도를 학습하고 최적화합니다.

Suppose you run a shipping service, so, you know, users come and ask you to help ship their package from location A to location B and suppose you run a website, where users repeatedly come and they tell you where they want to send the package from, and where they want to send it to (so the origin and destination) and your website offers to ship the package for some asking price, so I'll ship your package for $50, I'll ship it for $20. And based on the price that you offer to the users, the users sometimes chose to use a shipping service; that's a positive example and sometimes they go away and they do not choose to purchase your shipping service. So let's say that we want a learning algorithm to help us to optimize what is the asking price that we want to offer to our users.

여러분들은 배달 서비스 회사를 운영한다고 가정합니다. 사용자는 웹 사이트에 접속하여 물건을 A 장소에서 B 장소로 배달을 요청합니다. 사용자들이 반복적으로 접속하여 물건을 발송할 위치와 수신할 위치를 입력합니다. 웹 사이트는 출발지와 도착지 정보를 바탕으로 적정한 가격을 제안합니다. 배달 가격을 50달러 또는 20 달러로 제안할 것입니다. 사용자가 제안 가격을 보고 배달 서비스를 구매한다면 y =1 Positive 예제가 됩니다. 또는 배달 서비스를 구매하지 않는다면, y=0 Negative 예제입니다. 따라서, 학습 알고리즘이 사용자에게 최적화된 배달 가격을 제공하려고 한다고 가정합니다.

And specifically, let's say we come up with some sort of features that capture properties of the users. If we know anything about the demographics, they capture, you know, the origin and destination of the package, where they want to ship the package. And what is the price that we offer to them for shipping the package. and what we want to do is learn what is the probability that they will elect to ship the package, using our shipping service given these features, and again just as a reminder these features X also captures the price that we're asking for. And so if we could estimate the chance that they'll agree to use our service for any given price, then we can try to pick a price so that they have a pretty high probability of choosing our website while simultaneously hopefully offering us a

fair return, offering us a fair profit for shipping their package. So if we can learn this property of y equals 1 given any price and given the other features we could really use this to choose appropriate prices as new users come to us. So in order to model the probability of y equals 1, what we can do is use logistic regression or neural network or some other algorithm like that. But let's start with logistic regression.

구체적으로 사용자의 속성을 파악하는 피처를 생각해봅시다. 주요 피처는 지역별 인구, 배달할 물건의 출발지와 도착지 등입니다. 웹사이트는 주요 피처를 바탕으로 배달을 요청하는 사용자에게 배달 가격을 제시할 수 있습니다. 그리고 사용자들이 제시된 가격에 배달 서비스를 사용할 가능성을 추정할 수 있다면, 사용자들은 여러분의 웹사이트를 선택할 가능성이 높고 배달 서비스에 정당한 이윤을 얻을 수 있습니다. 만약 y = 1 Positive 예제에 해당하는 가격을 추정할 수 있다면, 여러분의 웹사이트에 접속한 신규 사용자에게 적절한 가격을 제시할 수 있습니다. 그래서, y의 확률을 모델링하기 위해 로지스틱 회귀 또는 인공 신경망 알고리즘을 사용합니다. 여기에서는 로지스틱 회귀 분석을 사용합니다.

Now if you have a website that just runs continuously, here's what an online learning algorithm would do. I'm gonna write repeat forever. This just means that our website is going to, you know, keep on staying up. What happens on the website is occasionally a user will come and for the user that comes we'll get some x, y pair corresponding to a customer or to a user on the website. So the features x are, you know, the origin and destination specified by this user and the price that we happened to offer to them this time around, and y is either one or zero depending one whether or not they chose to use our shipping service. Now once we get this {x, y} pair, what an online learning algorithm does is then update the parameters theta using just this example x, y, and in particular we would update my parameters theta as Theta j get updated as Theta j minus the learning rate alpha times my usual gradient descent rule for logistic regression. So we do this for j equals zero up to n, and that's my close curly brace.

여기 사용자들이 지속적으로 접속하는 웹 사이트가 있습니다. 온라인 학습 알고리즘은 영원히 다음과 같은 일을 반복할 것입니다.

Repeat Forever {

Get (x, y) % 접속한 사용자에 대한 피처 x와 실제값 y의 획득

Update θ using (x, y) ;

θj := θj - α * (hθ(x) - y)*xj (for j = 0,1,..., n)

}

여기서, 피처 x는 사용자가 지정한 출발지, 목적지, 그리고 웹사이트가 제시하는 가격입니다. y는 사용자가 배달 서비스를 사용하기로 선택했는지 여부입니다. 선택했다면 y =1이고, 선택하지 않았다면 y = 0입니다. 그리고 온라인 학습 알고리즘은 접속한 사용자에 대해 (x, y)의 쌍을 바탕으로 파라미터 θ의 값을 업데이트합니다. 로지스틱 회귀 분석에서 사용하는 경사 하강 업데이트 규칙을 사용합니다.

So, for other learning algorithms instead of writing X-Y, right, I was writing things like Xi, Yi but in this online learning setting where actually discarding the notion of there being a fixed training set instead we have an algorithm. Now what happens as we get an example and then we learn using that example like so and then we throw that example away. We discard that example and we never use it again and so that's why we just look at one example at a time. We learn from that example. We discard it. Which is why, you know, we're also doing away with this notion of there being this sort of fixed training set indexed by i. And, if you really run a major website where you really have a continuous stream of users coming, then this sort of online learning algorithm is actually a pretty reasonable algorithm. Because of data is essentially free if you have so much data, that data is essentially unlimited then there is really may be no need to look at a training example more than once. Of course if we had only a small number of users then rather than using an online learning algorithm like this, you might be better off saving away all your data in a fixed training set and then running some algorithm over that training set. But if you really have a continuous stream of data, then an online learning algorithm can be very effective.

확률적 경사 하강 알고리즘을 제외한 다른 알고리즘들은 루프에 (x, y) 대신에 (x^(i), y^(i))을 사용합니다. 온라인 학습 알고리즘은 고정된 학습 셋이라는 개념이 없습니다. 대신에 알고리즘은 지속적으로 유입되는 사용자로부터 학습 예제 하나를 획득합니다. 획득된 학습 예제를 학습한 후 폐기합니다. 한 번 사용했던 예제를 다시 사용하지 않습니다. 그것이 온라인 학습 알고리즘이 한 번에 하나의 예제만을 사용하고 고정된 학습 셋 (x^(i), y^(i))를 사용하지 않는 이유입니다. 온라인 학습 알고리즘은 획득한 모범 사례를 학습하고 폐기합니다. 사용자들이 지속적으로 유입되는 웹사이트는 충분히 새로운 데이터를 확보할 수 있기 때문에 과거의 데이터를 두 번 활용할 필요가 없습니다. 데이터는 충분하고 거의 무제한에 가깝기 때문입니다. 반대로 사용자가 지속적으로 유입되지만 데이터가 충분하지 않다면, 모든 데이터를 고정된 학습 셋에 저장한 다음 알고리즘을 실행하는 것이 좋습니다. 온라인 학습 알고리즘은 연속적인 데이터 스트림이 발생하는 환경에서 매우 효과적입니다.

I should mention also that one interesting effect of this sort of online learning algorithm is that it can adapt to changing user preferences. And in particular, if over time because of changes in the economy maybe users start to become more price sensitive and willing to pay, you know, less willing to pay high prices. Or if they become less price sensitive and they're willing to pay higher prices. Or if different things become more important to users, if you start to have new types of users coming to your website. This sort of online learning algorithm can also adapt to changing user preferences and kind of keep track of what your changing population of users may be willing to pay for. And it does that because if your pool of users changes, then these updates to your parameters theta will just slowly adapt your parameters to whatever your latest pool of users looks like.

온라인 학습 알고리즘은 사용자 선호도의 변화에 적응할 수 있다는 것입니다. 경제 상황이 변함에 따라 가격 설정을 변경할 수 있습니다. 예를 들면, 사용자들이 가격에 민감해지는 시기에는 낮은 가격을 제시하고, 사용자들이 가격에 둔감해지는 시기에는 높은 가격을 제시합니다. 또한, 새로운 유형의 사용자들이 지속적으로 증가한다거나 사용자들의 관심사가 바뀌었을 때 온라인 학습 알고리즘은 사용자 선호도에 맞추어 적응할 수 있습니다. 기꺼이 지불할 의향이 있는 사용자들을 추적하고 관리할 수 있습니다. 온라인 학습 알고리즘은 사용자 선호도의 변화에 따라 자동적으로 파라미터 θ를 천천히 조정할 수 있기 때문입니다.

Here's another example of a sort of application to which you might apply online learning. this is an application in product search in which we want to apply learning algorithm to learn to give good search listings to a user. Let's say you run an online store that sells phones - that sells mobile phones or sells cell phones. And you have a user interface where a user can come to your website and type in the query like "Android phone 1080p camera". So 1080p is a type of a specification for a video camera that you might have on a phone, a cell phone, a mobile phone. Suppose, suppose we have a hundred phones in our store. And because of the way our website is laid out, when a user types in a query, if it was a search query, we would like to find a choice of ten different phones to show what to offer to the user. What we'd like to do is have a learning algorithm help us figure out what are the ten phones out of the 100 we should return the user in response to a user-search query like the one here.

여기 온라인 학습 알고리즘의 또 다른 응용 사례가 있습니다. 온라인 학습 알고리즘이 사용자들에게 검색 결과를 제공하는 방법을 학습합니다. 여러분들이 스마트폰을 판매하는 상점을 운영한다고 가정합니다. 사용자들이 웹사이트에 방문하여 "안드로이드 폰 1080p 카메라"와 같은 검색어를 입력할 수 있는 검색창이 있습니다. 여기 1080p는 스마트폰에서 사용하는 카메라의 사양입니다. 현재 여러분의 상점에 100대의 전화기가 있습니다. 웹사이트의 검색 결과의 배치 방식과 검색 키워드에 따라 검색 결과를 10 가지 다른 스마트폰을 표시합니다. 학습 알고리즘은 100 대중에서 10대의 전화기를 사용자에게 표시합니다.

Here's how we can go about the problem. For each phone and given a specific user query; we can construct a feature vector X. So the feature vector X might capture different properties of the phone. It might capture things like, how similar the user search query is in the phones. We capture things like how many words in the user search query match the name of the phone, how many words in the user search query match the description of the phone and so on. So the features x capture properties of the phone and it captures things about how similar or how well the phone matches the user query along different dimensions. What we like to do is estimate the probability that a user will click on the link for a specific phone, because we want to show the user phones that they are likely to want to buy, want to show the user phones that they have high probability of clicking on in the web browser. So I'm going to define y equals one if the user clicks on the link for a phone and y equals zero otherwise and what I would like to do is learn the probability the user will click on a specific phone given, you know, the features x, which capture properties of the phone and how well the query matches the phone. To give this problem a name in the language of people that run websites like this, the problem of learning this is actually called the problem of learning the predicted click-through rate, the predicted CTR. It just means learning the probability that the user will click on the specific link that you offer them, so CTR is an abbreviation for click through rate.

검색 결과로 10대의 스마트폰을 표시하는 문제입니다. 사용자 검색 키워드와 스마트폰의 속성과 비교합니다. 피처 벡터 x는 스마트폰의 속성을 정의하고, 알고리즘은 사용자 검색 키워드가 스마트폰의 속성과 얼마나 유사한 지를 파악합니다. 주요 피처 x는 사용자 검색 키워드가 스마트폰의 이름과 일치하는지, 스마트 폰 설명과 몇 개나 일치하는지 등을 파악합니다. 스마트폰의 속성을 나타내는 피처 x와 사용자 검색 키워드가 얼마나 유사한 지를 파악합니다. 그리고, 사용자가 표시한 검색 결과에서 특정 스마트폰에 대한 URL 링크를 클릭할 확률을 추정하는 것입니다. 왜냐하면 웹사이트는 사용자가 구매할 가능성이 높은 스마트폰을 보여주고 싶기 때문입니다. 웹 브라우저에서 사용자가 스마트폰의 링크를 클릭하면 y=1이고, 그렇지 않으면 y=0입니다. 알고리즘은 사용자가 특정 스마트폰의 링크를 클릭할 때 확률을 학습합니다. 피처 x는 스마트폰의 속성과 사용자의 검색 키워드가 얼마나 일치하는 지를 파악하고, 클릭할 확률에 따라 검색 결과를 표시합니다. 이와 같은 웹사이트를 운영하는 사람들은 이런 유형의 학습 문제를 실제 예측 클릭률 (CTR, click-through rate)라고 합니다. 예측 CTR은 알고리즘이 사용자가 웹사이트가 제공하는 특정 링크를 클릭할 확률을 학습하는 것을 의미합니다.

And if you can estimate the predicted click-through rate for any particular phone, what we can do is use this to show the user the ten phones that are most likely to click on, because out of the hundred phones, we can compute this for each of the 100 phones and just select the 10 phones that the user is most likely to click on, and this will be a pretty reasonable way to decide what ten results to show to the user. Just to be clear, suppose that every time a user does a search, we return ten results what that will do is it will actually give us ten x, y pairs, this actually gives us ten training examples every time a user comes to our website because, because for the ten phone that we chose to show the user, for each of those 10 phones we get a feature vector X, and for each of those 10 phones we show the user we will also get a value for y, we will also observe the value of y, depending on whether or not we clicked on that url or not and so, one way to run a website like this would be to continuously show the user, you know, your ten best guesses for what other phones they might like and so, each time a user comes you would get ten examples, ten x, y pairs, and then use an online learning algorithm to update the parameters using essentially 10 steps of gradient descent on these 10 examples, and then you can throw the data away, and if you really have a continuous stream of users coming to your website, this would be a pretty reasonable way to learn parameters for your algorithm so as to show the ten phones to your users that may be most promising and the most likely to click on. So, this is a product search problem or learning to rank phones, learning to search for phones example.

특정 스마트폰에 대한 예상 클릭률을 추정할 수 있다면, 웹사이트는 사용자가 클릭할 가능성이 가장 높은 스마트폰 10대를 표시할 수 있습니다. 100대의 스마트폰에 대해 p( y=1 |x; θ)를 계산할 수 있기 때문입니다. 100 개의 스마트폰에 대해 사용자가 클릭할 확률을 계산하고, 가장 높은 확률을 가진 10 개의 스마트폰을 표시하는 것은 매우 합리적인 방법입니다. 명확히 말하면, 사용자가 검색할 때마다 10개의 결과를 반환한다는 것은 10 개의 (x, y) 쌍을 제공하는 것입니다. 각각의 스마트폰은 피처 벡터 x를 가지고 있습니다. 사용자가 특정 링크를 클릭했는 지의 여부에 따라 y의 값을 추적합니다. 웹사이트는 사용자가 검색을 요청할 때마다 10대의 다른 스마트폰을 표시합니다. 새로운 사용자가 검색할 때마다 학습 예제 (x, y) 쌍을 얻은 다음 온라인 학습 알고리즘은 10 개의 스마트폰을 표시하고 10번의 경사 하강법을 사용하여 파라미터를 업데이트합니다. 데이터를 제거하고 웹에 지속적으로 유입되는 사용자에게 가장 잘 팔릴 확률이 높은 10대의 스마트폰을 표시하기 위한 파라미터를 학습하는 매우 합리적인 방법입니다. 이것이 제품 검색 문제이기도 하고, 스마트폰들의 순위를 학습하는 것이고 하고, 스마트폰 검색하는 방법을 학습하는 것이기도 합니다.

So, I'll quickly mention a few others. One is, if you have a website and you're trying to decide, you know, what special offer to show the user, this is very similar to phones, or if you have a website and you show different users different news articles. So, if you're a news aggregator website, then you can again use a similar system to select, to show to the user, you know, what are the news articles that they are most likely to be interested in and what are the news articles that they are most likely to click on. Closely related to special offers, will we profit from recommendations.

몇 가지 다른 예제도 살펴보겠습니다. 웹사이트에서 사용자에게 어떤 특별한 제안을 표시할지를 결정할 때, 스마트폰과 매우 유사한 웹사이트에서 개인화된 글을 표시할 때 또는 개인화된 제품을 추천할 때 온라인 학습 알고리즘을 사용합니다. 뉴스를 모아놓은 웹사이트는 유사한 방식으로 사용하여 사용자가 가장 관심을 가질만한 뉴스 기사를 표시할 수 있습니다. 클릭할 가능성이 가장 높은 기사나 특별 제안이나 제품 추천을 하고, 웹사이트는 이익을 얻습니다.

And in fact, if you have a collaborative filtering system, you can even imagine a collaborative filtering system giving you additional features to feed into a logistic regression classifier to try to predict the click through rate for different products that you might recommend to a user. Of course, I should say that any of these problems could also have been formulated as a standard machine learning problem, where you have a fixed training set. Maybe, you can run your website for a few days and then save away a training set, a fixed training set, and run a learning algorithm on that. But these are the actual sorts of problems, where you do see large companies get so much data, that there's really maybe no need to save away a fixed training set, but instead you can use an online learning algorithm to just learn continuously from the data that users are generating on your website.

실제로 협업 필터링 시스템이 있는 경우 사용자에게 추천할 수 있는 다양한 제품의 클릭률을 예측하기 위해 로지스틱 회귀 분류기에 추가 피처를 제공하는 협업 필터링 시스템을 상상할 수 있습니다. 물론, 고정된 학습 셋이 있는 표준 머신 러닝 문제일 수 있습니다. 아마도 며칠 동안 웹사이트를 운영한 다음 고정된 학습 셋을 저장하고 적절한 알고리즘이 데이터 셋을 학습하게 할 수 있습니다. 그러나 이것은 활용의 문제입니다. 대기업이 너무 많은 데이터를 얻고 고정된 학습 셋을 저장할 필요가 없을 수도 있지만 온라인 학습 알고리즘이 웹사이트에서 지속적으로 생성되는 데이터를 연속적으로 학습할 수도 있습니다.

So, that was the online learning setting and as we saw, the algorithm that we apply to it is really very similar to this schotastic gradient descent algorithm, only instead of scanning through a fixed training set, we're instead getting one example from a user, learning from that example, then discarding it and moving on. And if you have a continuous stream of data for some application, this sort of algorithm may be well worth considering for your application. And of course, one advantage of online learning is also that if you have a changing pool of users, or if the things you're trying to predict are slowly changing like your user taste is slowly changing, the online learning algorithm can slowly adapt your learned hypothesis to whatever the latest sets of user behaviors are like as well.

지금까지 온라인 학습 알고리즘이었습니다. 온라인 학습 알고리즘은 확률적 경사 하강법 알고리즘과 매우 유사합니다. 고정된 훈련 셋을 통해 학습하는 대신에 하나의 예제를 사용합니다. 알고리즘은 접속한 한 명의 사용자에 대해 피처를 학습한 다음 데이터를 폐기합니다. 꾸준히 접속하는 사용자가 있는 웹사이트와 경우 이런 종류의 온라인 학습 알고리즘을 충분히 고려할 가치가 있습니다. 온라인 학습 알고리즘의 한 가지 장점은 사용자 구성원이 계속 변하거나 사용자의 취향이 변하거나 사용자 행동이 변하는 것에 맞추어 적응할 수 있다는 것입니다.

앤드류 응의 머신러닝 동영상 강의

정리하며

온라인 학습 알고리즘은 실시간으로 유입되는 데이터를 모델링하고 학습합니다. 온라인 학습 알고리즘은 웹사이트를 방문하는 수많은 사용자들은 연속적이고 지속적으로 데이터를 생성하면서 데이터 스트림이 발생합니다.

대형 웹사이트는 충분하고 거의 무제한에 가까운 새로운 데이터를 확보할 수 있기 때문에 과거의 데이터를 두 번 활용할 필요가 없습니다. 반대로 사용자가 지속적으로 유입되지만 데이터가 충분하지 않다면, 모든 데이터를 고정된 학습 셋에 저장한 다음 알고리즘을 실행하는 것이 좋습니다. 온라인 학습 알고리즘은 연속적인 데이터 스트림이 발생하는 환경에서 매우 효과적입니다.

온라인 학습 알고리즘은 사용자 선호도의 변화에 적응할 수 있다는 것입니다. 새로운 유형의 사용자들이 지속적으로 증가한다거나 사용자들의 관심사가 바뀌었을 때 온라인 학습 알고리즘은 사용자 선호도에 맞추어 적응할 수 있습니다. 온라인 학습 알고리즘은 사용자 선호도의 변화에 따라 자동적으로 파라미터 θ를 천천히 조정할 수 있기 때문입니다.

온라인 학습 알고리즘은 사용자의 속성을 나타내는 피처 x와 사용자 검색 키워드가 얼마나 유사한 지를 파악합니다. 그리고, 사용자가 표시한 검색 결과에서 특정 스마트폰에 대한 URL 링크를 클릭할 확률을 추정하는 것입니다. 웹 브라우저에서 사용자가 스마트폰의 링크를 클릭하면 y=1이고, 그렇지 않으면 y=0입니다. 알고리즘은 사용자가 특정 스마트폰의 링크를 클릭할 때 확률을 학습합니다. 이런 유형의 학습 문제를 실제 예측 클릭률 (CTR, click-through rate)라고 합니다.

온라인 학습 알고리즘은 확률적 경사 하강법 알고리즘과 매우 유사합니다. 고정된 훈련 셋을 통해 학습하는 대신에 하나의 예제를 사용합니다. 알고리즘은 접속한 한 명의 사용자에 대해 피처를 학습한 다음 데이터를 폐기합니다.