brunch

You can make anything
by writing

C.S.Lewis

by 라인하트 Dec 20. 2020

앤드류 응의 머신러닝(18-4):파이프라인 천정 분석

온라인 강의 플랫폼 코세라의 창립자인 앤드류 응 (Andrew Ng) 교수는 인공지능 업계의 거장입니다. 그가 스탠퍼드 대학에서 머신 러닝 입문자에게 한 강의를 그대로 코세라 온라인 강의 (Coursera.org)에서 무료로 배울 수 있습니다. 이 강의는 머신러닝 입문자들의 필수코스입니다. 인공지능과 머신러닝을 혼자 공부하면서 자연스럽게 만나게 되는 강의입니다.

Application Example

(응용 사례 )

Photo OCR (사진 OCR)

Ceiling Analysis : What part of the Pipeline to Work on Next

(천정 분석 : 다음에 할 작업할 파이프라인의 모듈 선택하기)

In earlier videos, I've said over and over that, when you're developing a machine learning system, one of the most valuable resources is your time as the developer, in terms of picking what to work on next. Or, if you have a team of developers or a team of engineers working together on a machine learning system. Again, one of the most valuable resources is the time of the engineers or the developers working on the system. And what you really want to avoid is that you or your colleagues your friends spend a lot of time working on some component. Only to realize after weeks or months of time spent, that all that worked just doesn't make a huge difference on the performance of the final system. In this video what I'd like to do is something called ceiling analysis.

지난 강의에서 반복적으로 말했습니다. 머신러닝 시스템을 개발할 때 가장 귀중한 자원은 개발자들의 시간입니다. 머신 러닝 시스템을 개발팀이나 엔지니어링팀과 함께 일할 때와 다음에 할 작업을 선택할 때 가장 중요한 자원은 엔지니어와 개발자들의 시간입니다. 특정 컴포넌트 또는 모듈에서 몇 주 또는 몇 달의 시간을 소비했지만 최종 시스템의 성능에 큰 차이가 없다는 것을 깨닫는 상황은 피해야 합니다. 이번 강의에서 천정 분석 (ceiling analysis)을 설명합니다.

When you're the team working on the pipeline machine on your system, this can sometimes give you a very strong signal, a very strong guidance on what parts of the pipeline might be the best use of your time to work on. To talk about ceiling analysis I'm going to keep on using the example of the photo OCR pipeline. And see right here each of these boxes, text detection, character segmentation, character recognition, each of these boxes can have even a small engineering team working on it. Or maybe the entire system is just built by you, either way. But the question is where should you allocate resources? Which of these boxes is most worth your effort of trying to improve the performance of. In order to explain the idea of ceiling analysis, I'm going to keep using the example of our photo OCR pipeline. As I mentioned earlier, each of these boxes here, each of these machines and components could be the work of a small team of engineers, or the whole system could be built by just one person. But the question is, where should you allocate scarce resources? That is, which of these components, which one or two or maybe all three of these components is most worth your time, to try to improve the performance of.

머신 러닝 시스템 파이프라인에 따라 작업하는 팀에 있을 때, 천정 분석은 파이프라인의 어떤 모듈에서 먼저 작업해야 할 지에 대한 매우 강력한 신호나 가이드를 제시합니다. 천정 분석을 설명하기 위해 OCR 파이프라인을 계속 사용합니다. 여기 텍스트 감지, 문자 분할, 문자 인식라는 세 개의 녹색 상자가 있습니다. 엔지니어링 팀이 각 상자마다 소규모로 나뉘어 작업하거나 한 사람이 전체 시스템을 직접 작업할 수 도 있습니다. 문제는 머신 러닝 시스템의 성능을 향상하기 위해 자원을 어디에 얼마나 할당해야 할지를 결정하는 것입니다.

So here's the idea of ceiling analysis. As in the development process for other machine learning systems as well, in order to make decisions on what to do for developing the system is going to be very helpful to have a single real number evaluation metric for this learning system. So let's say we pick character level accuracy. So if you're given a test set image, what is the fraction of alphabets or characters in a test image that we recognize correctly? Or you can pick some other single real number evaluation that you could, if you want. But let's say for whatever evaluation measure we pick, we find that the overall system currently has 72% accuracy. So in other words, we have some set of test set images. And from each test set images, we run it through text detection, then character segmentation, then character recognition. And we find that on our test set the overall accuracy of the entire system was 72% on whatever metric you chose.

여기 천정 분석에 대한 아이디어가 있습니다. 다른 머신 러닝 시스템의 개발 프로세스와 마찬가지로 시스템 개발을 위해 무엇을 해야 할지 결정할 때 머신 러닝 시스템에 대한 단일 실수 평가 지표는 매우 유용합니다. 단일 실수 평가 지표로 문자 인식 정확도를 선택합니다. 테스트 셋 이미지에서 올바르게 인식되는 알파벳 또는 문자의 비율은 얼마일까요? 또는 다른 단일 실수 평가 지표를 사용할 수 있습니다. 어떤 평가 지표를 선택하지 상관없이 현재 전체 시스템 정확도는 72%입니다. 다시 말해서 테스트 셋 이미지에서 텍스트 감지, 문자 분할, 문자 인식을 실행하였고, 테스트 셋에 대한 전체 시스템 정확도는 어떤 평가지표를 사용하던지 간에 72%입니다.

Now here's the idea behind ceiling analysis, which is that we're going to go through, let's say the first module of our machinery pipeline, say text detection. And what we're going to do, is we're going to monkey around with the test set. We're gonna go to the test set.For every test example, which is going to provide it the correct text detection outputs, so in other words, we're going to go to the test set and just manually tell the algorithm where the text is in each of the test examples. So in other words gonna simulate what happens if you have a text detection system with a hundred percent accuracy, for the purpose of detecting text in an image. And really the way you do that's pretty simple, right? Instead of letting your learning algorhtim detect the text in the images. You wouldn't say go to the images and just manually label what is the location of the text in my test set image. And you would then let these correct or let these ground truth labels of where is the text be part of your test set. And just use these ground truth labels as what you feed in to the next stage of the pipeline, so the character segmentation pipeline. Okay? So just to say that again. By putting a checkmark over here, what I mean is I'm going to go to my test set and just give it the correct answers. Give it the correct labels for the text detection part of the pipeline. So that as if I have a perfect test detection system on my test set. What we need to do then is run this data through the rest of the pipeline. Through character segmentation and character recognition. And then use the same evaluation metric as before, to measure what was the overall accuracy of the entire system. And with perfect text detection, hopefully the performance will go up. And in this example, it goes up by by 89%.

여기 천정 분석 아이디어가 있습니다. Photo OCR 파이프라인의 첫 번째 모듈인 텍스트 감지를 예로 들겠습니다. 테스트 셋을 조작할 것입니다. 모든 테스트 예제는 올바른 텍스트 감지 출력을 제공합니다. 즉, 각각의 테스트 예제에서 텍스트가 있는 위치를 알고리즘에게 직접 이야기해 줄 것입니다. 100% 정확도의 텍스트 감지 시스템이 있는 경우 어떤 일이 발생할지를 시뮬레이션합니다. 매우 단순한 방법입니다. 학습 알고리즘이 이미지의 텍스트를 감지하는 대신에 테스트 이미지의 텍스트가 있는 위치를 수동으로 레이블을 지정합니다. 수동으로 레이블이 지정된 테스트 셋으로 알고리즘의 텍스트 감지 모듈을 테스트합니다. 그리고, 실측 레이블이 달린 이미지를 파이프라인의 다음 단계인 문자 분할 모듈로 전달합니다. 텍스트 감지 모듈의 체크 표시는 테스트 셋에 대해 올바른 답변을 했다는 것을 의미합니다. 테스트 감지 시스템은 테스트 셋에 대해 완벽하게 100% 올바르게 동작했습니다. 문자 분할 및 문자 인식을 포함한 파이프라인의 마지막 모듈까지 실행을 합니다. 전과 동일한 전체 시스템 평가 지표를 사용하여 전반적인 시스템 정확도를 측정합니다. 완벽한 텍스트 감지 모듈로 인해 시스템 전체의 성능이 향상되기를 바랍니다. 여기서는 전체 시스템 정확도는 89%까지 향상되었습니다.

And then we're gonna keep going, let's got o the next stage of the pipeline, so character segmentation.So again, I'm gonna go to my test set, and now I'm going to give it the correct text detection output and give it the correct character segmentation output. So go to the test set and manually label the correct segmentations of the text into individual characters, and see how much that helps. And let's say it goes up to 90% accuracy for the overall system. Right? So as always the accuracy of the overall system. So is whatever the final output of the character recognition system is.

다음 단계는 문자 분할 모듈입니다. 다시 한번 테스트 셋을 실행하여 올바른 텍스트 감지 출력을 확인하고 올바른 문자 분할 출력을 제공할 것입니다. 따라서, 테스트 셋에 대해 올바른 문자 분할을 수동으로 레이블을 지정하여 100% 완벽한 테스트 셋 예제를 만듭니다. 그리고 이것이 얼마나 도움이 되는 지를 확인합니다. 문자 분할 모듈이 100%의 정확도를 가질 때 전체 시스템은 90%까지 향상된다고 가정합니다.

Whatever the final output of the overall pipeline, is going to measure the accuracy of that. And finally I'm going to build a character recognition system and give that correct labels as well, and if I do that too then no surprise I should get 100% accuracy.

따라서, 전체 파이프라인의 출력이 무엇이든지 정확도를 측정합니다. 마지막으로 문자 인식 시스템을 구축하고 올바른 레이블을 제공합니다. 100% 정확도를 얻는 것은 놀랍지 않을 것입니다.

Now the nice thing about having done this analysis is, we can now understand what is the upside potential of improving each of these components? So we see that if we get perfect text detection, our performance went up from 72 to 89%. So that's a 17% performance gain. So this means that if we take our current system we spend a lot of time improving text detection, that means that we could potentially improve our system's performance by 17%. It seems like it's well worth our while. Whereas in contrast, when going from text detection when we gave it perfect character segmentation, performance went up only by 1%, so that's a more sobering message. It means that no matter how much time you spend on character segmentation. Maybe the upside potential is going to be pretty small, and maybe you do not want to have a large team of engineers working on character segmentation. This sort of analysis shows that even when you give it the perfect character segmentation, you performance goes up by only one percent. That really estimates what is the ceiling, or what is an upper bound on how much you can improve the performance of your system and working on one of these components. And finally, going from character, when we get better character recognition with the forms went up by ten percent. So again you can decide is ten percent improvement, how much is worth your while? This tells you that maybe with more effort spent on the last stage of the pipeline, you can improve the performance of the systems as well.

천정 분석의 좋은 점은 각 모듈을 개선할 때 상승 잠재력이 얼마나 되는 지를 알 수 있다는 것입니다. 완벽한 텍스트 감지 시스템은 전체 시스템 성능을 72%에서 89%로 향상하였습니다. 즉, 17%의 성능 향상입니다. 따라서, 현재 Photo OCR 시스템에서 텍스트 감지를 개선하기 위해 많은 시간을 할애할 필요가 있습니다. 시스템 성능이 잠재적으로 17% 향상할 수 있기 때문입니다. 반대로 완벽한 문자 분할 시스템을 제공할 때 성능 향상은 1% 였습니다. 이것은 문자 분할 시스템을 개선하기 위해 많은 노력을 들일 필요가 없다는 것입니다. 분자 분할 모듈에 대규모 엔지니어링 팀을 둘 필요가 없습니다. 실제로 시스템 성능을 향상할 수 있는 상한선 즉 천정이 얼마인지를 추정합니다. 마지막으로 더 나은 문자 인식 시스템을 얻었을 때 전체 시스템 정확도는 10% 향상되었습니다. 10%의 향상은 얼마나 가치가 있습니까? 파이프라인의 마지막 단계에서 더 많은 노력을 기울이면 시스템 성능도 향상할 수 있다는 것을 의미합니다.

Another way of thinking about this, is that by going through these sort of analysis you're trying to think about what is the upside potential of improving each of these components. Or how much could you possibly gain if one of these components became absolutely perfect? And this really places an upper bound on the performance of that system. So the idea of ceiling analysis is pretty important.

천정 분석은 각 모듈에서 개선하는 상승 잠재력이 얼마인지를 미리 알려줍니다. 특정 모듈 완벽하게 동작한다면, 전체 시스템의 성능은 얼마나 개선될까요? 천정 분석은 시스템 성능에 상한선을 분석합니다. 따라서 천정 분석에 대한 아이디어는 꽤 중요합니다.

Let me just answer this idea again but with a different example but more complex one. Let's say that you want to do face recognition from images. You want to look at the picture and recognize whether or not the person in this picture is a particular friend of yours, and try to recognize the person Shown in this image. This is a slightly artificial example, this isn't actually how face recognition is done in practice. But we're going to set for an example, what a pipeline might look like to give you another example of how a ceiling analysis process might look. So we have a camera image, and let's say that we design a pipeline as follows

천정 분석 아이디어에 대한 좀 더 복잡한 다른 사례를 보겠습니다. 여기 이미지에서 얼굴을 인식하는 시스템이 있습니다. 인공지능 시스템은 사진에 있는 사람이 여러분의 친구인지 아닌지를 인식하고 누구인지를 인식합니다. 여기서 설명하는 것은 실제로 사용하는 얼굴 인식 방식은 아닙니다. 천정 분석 프로세스와 파이프라인을 좀 더 설명하기 위해 얼굴 인식 시스템 예제를 사용합니다. 여기 카메라 이미지가 있습니다. 다음과 같이 파이프라인을 디자인합니다.

The first thing you wanna do is pre-processing of the image. So let's take this image like we have shown on the upper right, and let's say we want to remove the background. So do pre-processing and the background disappears.

처음으로 할 일은 사진에서 배경을 제거하는 것입니다. 데이터를 전처리 프로세스로 배경을 제거합니다.

Next we want to say detect the face of the person, that's usually done on the learning. So we'll run a sliding Windows classifier to draw a box around a person's face.

다음으로 얼굴을 감지합니다. 일반적으로 학습 알고리즘이 하는 작업입니다. 슬라이딩 윈도우 분류기는 얼굴 감지한 후 주위에 주황색 상자를 그립니다.

Having detected the face, it turns out that if you want to recognize people, it turns out that the eyes is a highly useful cue. We actually are, in terms of recognizing your friends the appearance of their eyes is actually one of the most important cues that you use. So lets run another classifier to detect the eyes of the person. So the segment of the eyes and then since this will give us useful features to recognize the person. And then other parts of the face of physical interest. Maybe segment of the nose, segment of the mouth. And then having found the eyes, the nose, and the mouth, all of these give us useful features to maybe feed into a logistic regression classifier. And there's a job with a cost priority, they'd give us the overall label, to find the label for who we think is the identity of this person. So this is a kind of complicated pipeline, it's actually probably more complicated than you should be using if you actually want to recognize people, but there's an illustrative example that's useful to think about for ceiling analysis.

얼굴 인식 시스템이 누구인지 인식하기 위한 유용한 단서는 눈입니다. 실제로 친구를 인식할 때 눈의 모양은 가장 중요한 단서 중 하나입니다. 사람의 눈을 감지하기 위해 또 다른 분류기를 실행합니다. 눈은 사람을 인식하는 매우 유용한 피처입니다. 얼굴의 나머지 부위인 코 부분, 입부분을 발견합니다. 그리고, 로지스틱 회귀 분류기에 입력할 수 있는 모든 피처를 수집합니다. 누구인지 파악하기 위해 친구들의 전체 레이블을 제공합니다. 이것은 매우 복잡한 파이프라인입니다. 사람을 인식하는 시스템은 실제로 더 복잡합니다. 하지만 천정 분석에 대한 이해를 도울 수 있습니다.

So how do you go through ceiling analysis for this pipeline. Well se step through these pieces one at a time. Let's say your overall system has 85% accuracy. The first thing I do is go to my test set and manually give it the full background segmentation. So manually go to the test set. And use Photoshop or something to just tell it where's the background and just manually remove the graph background, so this is a ground true background, and see how much the accuracy changes. In this example the accuracy goes up by 0.1%. So this is a strong sign that even if you have perfect background segmentation, the form is, even with perfect background removal the performance or your system isn't going to go up that much. So it's maybe not worth a huge effort to work on pre-processing on background removal.

파이프라인에 대한 천정 분석을 진행합니다. 모듈들을 한 번에 하나씩 살펴봅니다. 현재 전체 시스템의 정확도는 85%입니다. 가정 먼저 해야 할 일은 테스트 셋에 사진들에 전체 배경을 수동으로 제거하는 것입니다. 따라서, 포토샵이나 다른 애플리케이션을 사용하여 수동으로 테스트 셋 사진의 배경을 제거하면 정확도가 얼마나 올라가는 지를 봅니다. 정확도는 0.1% 상승하였습니다. 따라서, 완벽한 배경 제거 모듈은 전체 시스템 성능을 향상할 수 없다는 강력한 신호입니다. 배경 제거에 대한 전처리 작업에 큰 노력을 기울일 가치가 없을 것입니다.

Then quickly goes to test set give it the correct face detection images then again step though the eyes nose and mouth segmentation in some order just pick one order. Just give the correct location of the eyes. Correct location in noses, correct location in mouth, and then finally if I just give it the correct overall label I can get 100% accuracy. And so as I go through the system and just give more and more components, the correct labels in the test set, the performance of the overall system goes up and you can look at how much the performance went up on different steps. So from giving it the perfect face detection, it looks like the overall performance of the system went up by 5.9%. So that's a pretty big jump. It means that maybe it's worth quite a bit effort on better face detection. Went up 4% there, it went up 1% there. 1% there, and 3% there. So it looks like the components that most work are while are, when I gave it perfect face detection system went up by 5.9 performance when given perfect eyes segmentation went to four percent. And then my final which is cost for well there's another three percent, gap there maybe. And so this tells maybe whether the components are most worthwhile working on.

다음으로 테스트 셋 이미지에서 수동으로 얼굴만 남깁니다. 그리고, 눈, 코, 입을 정확한 위치를 각 모듈에 알려줍니다. 마지막으로 전체 레이블을 지정하면 100% 정확도를 얻을 수 있습니다. 천정 분석은 전체 시스템과 컴포넌트를 살펴보고, 테스트 셋의 올바른 레이블을 제공하면 전체 시스템의 성능을 향상할 수 있고 얼마나 향상할 수 있는 지를 알 수 있습니다. 완벽한 얼굴 인식 기능을 제공하면 전체 시스템의 성능을 5.9% 향상할 수 있습니다. 꽤 큰 도약을 이룰 수 있습니다. 얼굴 인식 모듈에 상당한 노력을 기울일 가치가 있습니다. 눈 부분을 검출하는 것은 4%, 누구인지 식별하는 멀티 클래스 분류가 3% 향상합니다. 천정 분석은 다음 작업할 가치가 있는 모듈을 알려줍니다.

And by the way I want to tell you a true cautionary story. The reason I put this is in this in preprocessing background removal is because I actually know of a true story where there was a research team that actually literally had to people spend about a year and a half, spend 18 months working on better background removal. But actually I'm obscuring the details for obvious reasons, but there was a computer vision application where there's a team of two engineers that literally spent about a year and a half working on better background removal, actually worked out really complicated algorithms and ended up publishing one research paper. But after all that work they found that it just did not make huge difference to the overall performance of the actual application they were working on and if only someone were to do ceiling analysis before hand maybe they could have realized. And one of them said to me afterward. If only you've did this sort of analysis like this maybe they could have realized before their 18 months of work. That they should have spend their effort focusing on some different component then literally spending 18 months working on background removal.

그리고 주의해야 할 점도 있습니다. 이미지에서 배경을 제거하기 위해 전처리 과정을 넣은 이유가 있습니다. 실제로 약 1년 반 즉 18개월 동안 더 나은 배경 제거 작업을 수행할 알고리즘을 개발한 실제 이야기를 알고 있기 때문입니다. 분명한 이유로 세부 사항은 모호하게 설명할 것입니다. 컴퓨터 비전 애플리케이션을 개발하는 두 명의 엔지니어팀은 1년 반을 투자하였고 정말 복잡한 알고리즘을 구현하였고 끝내 한 연구 논문을 출판했습니다. 모든 작업을 완료한 후 컴퓨터 비전 애플리케이션의 전체 성능에는 큰 차이가 없다는 것을 발견했습니다. 누군가 사전에 천정 분석을 했다면 아마도 깨달았을 것입니다. 천정 분석을 알았다면 18개월 전에 배경 제거 작업을 개선하는 것이 의미가 없다는 것을 깨달았을 수도 있습니다. 따라서, 다른 더 중요한 몇 가지 모듈에 집중한 후에 문자 그대로 배경 제거 작업에 18개월을 소비했어야 했습니다.

So to summarize, pipelines are pretty pervasive in complex machine learning applications. And when you're working on a big machine learning application, your time as developer is so valuable, so just don't waste your time working on something that ultimately isn't going to matter. And in this video we'll talk about this idea of ceiling analysis, which I've often found to be a very good tool for identifying the component of a video as you put focus on that component and make a big difference will actually have a huge effect on the overall performance of your final system. So over the years working machine learning, I've actually learned to not trust my own gut feeling about what components to work on. So very often, I've work on machine learning for a long time, but often I look at a machine learning problem, and I may have some gut feeling about oh, let's jump on that component and just spend all the time on that. But over the years, I've come to even trust my own gut feelings and learn not to trust gut feelings that much. And instead, if you have a sort of machine learning problem where it's possible to structure things and do a ceiling analysis, often there's a much better and much more reliable way for deciding where to put a focused effort, to really improve the performance of some component. And be kind of reassured that, when you do that, what actually have a huge effect on the final performance of the overall system.

요약하자면, 복잡한 머신러닝 애플리케이션은 파이프라인을 많이 사용합니다. 대형 머신 러닝 애플리케이션에서 작업할 때 개발자들의 시간은 매우 소중하기 때문에 궁극적으로 중요하지 않은 작업에 시간을 낭비할 필요는 없습니다. 이번 강의에서 천정 분석에 대한 아이디어를 다루었습니다. 천정 분석은 전체 시스템의 성능에 큰 영향을 미치는 모듈이나 컴포넌트를 발견하기 위해 집중해야 할 모듈을 식별하는 매우 좋은 도구라는 것을 발견했습니다. 저는 수년 동안 머신 러닝을 하면서 각 구성요소나 컴포넌트에 대한 개인적인 직감을 믿어서는 안 된다는 것을 배웠습니다. 한 분야에 오래 종사할수록 직감이 생길 수 있고, 직감에 의존한 결정으로 모든 시간을 소비할 수 있습니다. 수년에 걸쳐 머신러닝 시스템을 다룬 결과 직감을 신뢰하지 않는 법을 배웠습니다. 그 대신에 사물을 구조화하고 천정 분석을 수행하여 집중적으로 시간과 노력을 기울일 컴포넌트나 모듈을 결정합니다. 그것이 성능을 개선하기 위한 훨씬 더 좋고 훨씬 더 신뢰할 수 있는 방법입니다. 그것이 실제로 전체 시스템의 최종 성능에 큰 영향을 미칠 것입니다.

앤드류 응의 머신러닝 동영상 강의

정리하며

머신 러닝 시스템을 개발팀이나 엔지니어링팀과 함께 일할 때와 다음에 할 작업을 선택할 때 가장 중요한 자원은 엔지니어와 개발자들의 시간입니다. 따라서, 직감에 의존하여 결정하지 않고 단일 실수 평가 지표를 바탕으로 합리적으로 판단해야 합니다. 천정 분석은 파이프라인의 컴포넌트 중에서 어떤 모듈에서 먼저 작업해야 할지를 알려주는 강력한 신호입니다.

천정 분석은 파이프라인의 각 컴포넌트가 하나씩 완벽하게 동작하도록 가정하고 전체 시스템의 성능을 측정합니다. 예를 들면, 얼굴 인식 시스템이 이미지에서 배경 제거 컴포넌트가 있다고 가정합니다. 포토샵을 활용해 학습 예제 이미지의 배경을 완벽하게 제거한 후에 시스템에 적용하는 것입니다.