brunch

You can make anything
by writing

C.S.Lewis

by 라인하트 Nov 23. 2020

앤드류 응의 머신러닝(13-3): K-평균 최적화 목표

온라인 강의 플랫폼 코세라의 창립자인 앤드류 응 (Andrew Ng) 교수는 인공지능 업계의 거장입니다. 그가 스탠퍼드 대학에서 머신 러닝 입문자에게 한 강의를 그대로 코세라 온라인 강의 (Coursera.org)에서 무료로 배울 수 있습니다. 이 강의는 머신러닝 입문자들의 필수코스입니다. 인공지능과 머신러닝을 혼자 공부하면서 자연스럽게 만나게 되는 강의입니다.

Unsupervised Learning

비지도 학습

Clustering (클러스터링)

Optimization Objective (최적화 목표)

Most of the supervised learning algorithms we've seen, things like linear regression, logistic regression, and so on, all of those algorithms have an optimization objective or some cost function that the algorithm was trying to minimize. It turns out that k-means also has an optimization objective or a cost function that it's trying to minimize. And in this video I'd like to tell you what that optimization objective is.

선형 회귀, 로지스틱 회귀 등과 같은 대부분의 지도 학습 알고리즘은 최소화할 수 있는 최적화 목표나 비용 함수가 있습니다. K-평균 알고리즘도 최소화하려는 최적화 목표와 비용 함수가 있습니다. 이번 강의에서 최적화 목표를 설명합니다.

And the reason I want to do so is because this will be useful to us for two purposes. First, knowing what is the optimization objective of k-means will help us to debug the learning algorithm and just make sure that k-means is running correctly. And second, and perhaps more importantly, in a later video we'll talk about how we can use this to help k-means find better costs for this and avoid the local ultima. But we do that in a later video that follows this one.

최적화 목표는 두 가지 부분에서 유용합니다. 첫째, K-평균 알고리즘의 최적화 목표가 무엇인지 알면 학습 알고리즘을 올바르게 실행되는지 확인하고 디버깅하는 데 도움이 됩니다. 두 번째, K-평균 알고리즘이 로컬 최적 값을 피하고 더 나은 최적 값을 찾는 데 도움을 줄 수 있습니다.

Just as a quick reminder while k-means is running we're going to be keeping track of two sets of variables. First is the ci's and that keeps track of the index or the number of the cluster, to which an example xi is currently assigned. And then the other set of variables we use is mu subscript k, which is the location of cluster centroid k. Again, for k-means we use capital K to denote the total number of clusters. And here lower case k is going to be an index into the cluster centroids and so, lower case k is going to be a number between one and capital K.

K-평균 알고리즘은 실행하는 동안 두 개의 변수를 추적합니다. 첫 번째 변수는 C^(i)입니다. C^(i)는 학습 예제 x^(i)에 할당된 클러스터 번호 또는 인덱스를 추적합니다. 두 번째 변수는 μk입니다. μk는 클러스터 중심(Cluster centroid) k의 위치입니다. 대문자 K는 클러스터의 총 수를 나타내고, 소문자 k는 클러스터 중심의 인덱스입니다. 따라서, 소문자 k는 1에서부터 대문자 K 사이의 숫자입니다.

Now here's one more bit of notation, which is gonna use mu subscript ci to denote the cluster centroid of the cluster to which example xi has been assigned, right? And to explain that notation a little bit more, lets say that xi has been assigned to cluster number five. What that means is that ci, that is the index of xi, that that is equal to five. Right? Because having ci equals five, if that's what it means for the example xi to be assigned to cluster number five. And so mu subscript ci is going to be equal to mu subscript 5. Because ci is equal to five. And so this mu subscript ci is the cluster centroid of cluster number five, which is the cluster to which my example xi has been assigned.

한 가지 변수가 더 있습니다. μc^(i)는 학습 예제 x^(i)가 할당된 클러스터의 클러스터 중심을 나타납니다. 예를 들면 x^(i)에 클러스터 5를 할당했다고 가정합니다. x^(i) = 5이고, C^(i)는 x^(i)의 인덱스이며 5이므로 C^(i) = 5입니다. C^(i) = 5이므로 μc^(i) = μ5입니다. 결국 x^(i)를 클러스터 5에 할당한다는 의미입니다. 아래 첨자 C^(i)는 μ 아래 첨자 5입니다.

Out with this notation, we're now ready to write out what is the optimization objective of the k-means clustering algorithm and here it is. The cost function that k-means is minimizing is a function J of all of these parameters, c1 through cm and mu 1 through mu K. That k-means is varying as the algorithm runs. And the optimization objective is shown to the right, is the average of 1 over m of sum from i equals 1 through m of this term here.

이 세 가지 변수를 사용하면 K-평균 클러스터링 알고리즘의 최적화 목표가 무엇인지를 정리할 수 있습니다. 여기에 알고리즘이 있습니다. K-평균을 최소화하는 비용 함수는 C^(1), C^(2),... , C^(m) 및 μ1, μ2,... , μk까지의 모든 파라미터에 대한 함수 J입니다. K-평균 알고리즘은 실행을 반복할 때마다 값이 변합니다. 최적화 목표는 아래와 같습니다.

That I've just drawn the red box around, right? The square distance between each example xi and the location of the cluster centroid to which xi has been assigned. So let's draw this and just let me explain this. Right, so here's the location of training example xi and here's the location of the cluster centroid to which example xi has been assigned. So to explain this in pictures, if here's x1, x2, and if a point here is my example xi, so if that is equal to my example xi, and if xi has been assigned to some cluster centroid, I'm gonna denote my cluster centroid with a cross, so if that's the location of mu 5, let's say. If x i has been assigned cluster centroid five as in my example up there, then this square distance, that's the square of the distance between the point xi and this cluster centroid to which xi has been assigned. And what k-means can be shown to be doing is that it is trying to define parameters ci and mu i. Trying to find c and mu to try to minimize this cost function J. This cost function is sometimes also called the distortion cost function, or the distortion of the k-means algorithm.

|| x^(i) - μk|| 항은 각 학습 예제 x^(i)와 x^(i)에 할당된 클러스터 중심까지 거리의 제곱입니다. 그림에서 여기 x^(i)가 있고, 클러스터 중심 μ5가 있습니다. 수평축은 x1이고 수직축은 x2입니다. 학습 예제 x^(i)에 특정 클러스터 중심 μ5가 할당되었습니다. K-평균 알고리즘은 x^(i)와 클러스터 중심 μ5 사이의 거리의 제곱을 계산하고 평균을 계산합니다. 파라미터 C^(i)와 μi의 값을 정의합니다. 비용 함수 J를 최소화하기 위한 C와 μ를 찾습니다. 비용 함수는 때때로 왜곡된 비용 함수 또는 K-평균 알고리즘의 왜곡이라고 부릅니다.

And just to provide a little bit more detail, here's the k-means algorithm. Here's exactly the algorithm as we have written it out on the earlier slide. And what this first step of this algorithm is, this was the cluster assignment step where we assigned each point to the closest centroid. And it's possible to show mathematically that what the cluster assignment step is doing is exactly Minimizing J, with respect to the variables c1, c2 and so on, up to cm, while holding the cluster centroids mu 1 up to mu K, fixed. So what the cluster assignment step does is it doesn't change the cluster centroids, but what it's doing is this is exactly picking the values of c1, c2, up to cm. That minimizes the cost function, or the distortion function J. And it's possible to prove that mathematically, but I won't do so here. But it has a pretty intuitive meaning of just well, let's assign each point to a cluster centroid that is closest to it, because that's what minimizes the square of distance between the points in the cluster centroid.

그리고 조금 더 자세히 설명하기 위해 여기 K-평균 알고리즘이 있습니다. K-평균 알고리즘의 첫 단계는 가장 가까운 클러스터 중심을 할당하는 클러스터 할당 단계입니다. C^(1), C^(2),..., C^(m)까지 비용 함수 J를 최소화하고 클러스터 중심 μ1, μ2,..., μk를 할당하는 것을 수학적으로 보여줍니다. 따라서, 클러스터 할당 단계에서 클러스터 중심을 변경하지 않습니다. 하지만, 각 학습 예제에 정확히 C^(1), C^(2),..., C^(m)까지의 값을 할당합니다. 그것이 비용 함수 또는 왜곡된 비용 함수 J를 최소화합니다. 그것을 수학적으로 증명하는 것은 가능하지만 여기서는 하지 않을 것입니다. 그러나 이것은 매우 직관적인 의미를 가지고 있습니다. 각 학습 예제에 가장 가까운 클러스터 중심을 할당하기 위해 학습 예제와 모든 클러스터 중심 간의 거리의 제곱을 계산하고 가장 작은 값을 취합니다.

And then the second step of k-means, this second step over here. The second step was the move centroid step. And once again I won't prove it, but it can be shown mathematically that what the move centroid step does is, it chooses the values of mu that minimizes J, so it minimizes the cost function J with respect to, wrt is my abbreviation for, with respect to, when it minimizes J with respect to the locations of the cluster centroids mu 1 through mu K. So if is really is doing is this taking the two sets of variables and partitioning them into two halves right here. First the c sets of variables and then you have the mu sets of variables. And what it does is it first minimizes J with respect to the variable c and then it minimizes J with respect to the variables mu and then it keeps on. And, so all that's all that k-means does.

K-평균 알고리즘의 두 번째 단계는 클러스터 중심 이동 단계입니다. 다시 한번 증명하지 않겠지만 클러스터 중심 단계가 무엇인지를 수학적으로 보여줄 수 있습니다. 클러스터 중심 이동 단계에서 비용 함수 J를 최소화하는 μ의 값을 선택합니다. 그래서 μ1에서 μk까지의 클러스터 중심의 위치에 관해 최소화할 때 비용 함수 J를 최소화합니다. 저는 가끔 with respect to를 wrt로 줄어서 표현하기도 합니다. 두 개의 변수 세트 C^(1), C^(2),..., C^(m)와 μ1, μ2,..., μk를 취하여 나눌 것입니다. 먼저 C 변수 세트가 있고 다음에 μ 변수 세트가 있습니다. 먼저 변수 C에 대해 비용 함수 J를 최소화한 다음에 변수 μ 에 대해 비용 함수 J를 최소한 다음 계속 유지하는 것입니다. 이것이 K-평균 알고리즘의 모든 것입니다.

And now that we understand k-means as trying to minimize this cost function J, we can also use this to try to debug other any algorithm and just kind of make sure that our implementation of k-means is running correctly. So, we now understand the k-means algorithm as trying to optimize this cost function J, which is also called the distortion function. We can use that to debug k-means and help make sure that k-means is converging and is running properly.

이제 여러분은 K-평균 알고리즘은 비용 함수 J를 최소화하려는 것으로 이해하였습니다. 다른 모든 알고리즘을 디버깅하고 K-평균 알고리즘이 올바르게 실행하는 지를 확인할 수 있습니다. 이제 왜곡된 함수라고도 하는 비용 함수 J를 최적화하는 것으로 K-평균 알고리즘을 이해할 수 있습니다. K-평균을 디버깅하고 K-평균이 수렴되고 제대로 실행되는 지를 확인할 수 있습니다.

And in the next video we'll also see how we can use this to help k-means find better clusters and to help k-means to avoid

다음 강의에서 K-평균 알고리즘이 더 나은 클러스터를 찾을 수 있게 도와주기 위해 사용하는 방법을 살펴볼 것입니다

앤드류 응의 머신러닝 동영상 강의

정리하며

클러스터링 알고리즘은 레이블이 없는 데이터 셋을 자동으로 일정한 클러스터, 집합 또는 그룹으로 묶습니다. 대표적인 알고리즘은 K-평균 알고리즘입니다. K-평균 알고리즘의 운영 방식은 다음과 같습니다.

1) 클러스터의 수를 결정

2) 클러스터 중심 위치를 무작위로 초기화

3) 클러스터링 할당 단계

모든 학습 예제가 가장 가까운 클러스터를 결정합니다.

하나의 학습 예제와 다수의 클러스터 중심까지의 거리를 측정한 후에 가장 가까운 클러스터를 선택합니다. 대문자 K는 클러스터 또는 클러스터 중심의 총 수이고 소문자 k는 클러스터 중심에 대한 인덱스이며 1부터 k 사이의 숫자입니다. C^(i)는 x^(i)와 클러스터 중심 사이의 거리가 가장 짧은 클러스터 중심을 말합니다. 예를 들어, x^(1)가 클러스 클러스터 중심 μ1, μ2, μ3, μ4 중에서 μ2와의 거리가 제일 작다면, C^(1) = 2입니다.

4) 클러스터 중심 이동 단계

클러스터 별로 학습 예제들의 평균값을 계산하고 새로운 클러스터 중심으로 이동하는 단계입니다. 각 평균을 계산하는 공식은 다음과 같습니다.

K-평균 알고리즘도 비용 함수가 있습니다. 최적화 목표를 통해 K-평균 알고리즘이 올바르게 실행되는지를 확인하고 디버깅에 도움이 됩니다. 또한, 로컬 최적화를 피하고 더 나은 최적 값을 찾을 수 있습니다. K-평균 알고리즘은 세 개의 변수를 추적합니다.

C^(i)는 학습 예제 x^(i)에 할당된 클러스터 번호 또는 인덱스

μk는 클러스터 중심에 대한 인덱스

μc^(i)는 학습 예제 x^(i)가 할당된 클러스터의 클러스터 중심

예를 들면 x^(i)에 클러스터 5를 할당했다고 가정합니다. C^(i) = 5이므로 μc^(i) = μ5입니다. 결국 x^(i)가 클러스터 5에 할당된다는 의미입니다. 비용 함수 J를 최소화하기 위한 C와 μ를 찾습니다. 비용 함수는 왜곡된 비용 함수 또는 K-평균 알고리즘의 왜곡이라고 부릅니다.

5) 수렴할 때까지 반복 수행

클러스터링 할당 단계와 클러스터링 중심 이동 단계를 반복 수행합니다. 클러스터 중심이 변하지 않고 학습 예제의 클러스터도 변하지 않을 때까지 반복하고 이것을 수렴한다고 합니다.