brunch

You can make anything
by writing

C.S.Lewis

by 라인하트 Jan 03. 2021

머신러닝 옥타브 실습 (2-3) : 로지스틱 회귀

온라인 강의 플랫폼 코세라의 창립자인 앤드류 응 (Andrew Ng) 교수는 인공지능 업계의 거장입니다. 그가 스탠퍼드 대학에서 머신 러닝 입문자에게 한 강의를 그대로 코세라 온라인 강의 (Coursera.org)에서 무료로 배울 수 있습니다. 이 강의는 머신러닝 입문자들의 필수코스입니다. 인공지능과 머신러닝을 혼자 공부하면서 자연스럽게 만나게 되는 강의입니다.

Programing Exercise 2 : Logistic Regression

프로그래밍 실습 2 : 로지스틱 회귀

2. Regularized logistic regression (구현)

In this part of the exercise, you will implement regularized logistic regression to predict whether microchips from a fabrication plant passes quality assur- ance (QA). During QA, each microchip goes through various tests to ensure it is functioning correctly.

Suppose you are the product manager of the factory and you have the test results for some microchips on two different tests. From these two tests, you would like to determine whether the microchips should be accepted or rejected. To help you make the decision, you have a dataset of test results on past microchips, from which you can build a logistic regression model.

You will use another script, ex2 reg.m to complete this portion of the exercise.

이번 실습은 제조 공장의 마이크로 칩이 품질 보증 (QA) 테스트 통과 여부를 예측하기 위해 정규화된 로지스틱 회귀를 구현합니다. QA 동안 마이크로칩은 올바르게 동작하는 지를 확인하기 위해 여러 테스트를 합니다.

여러분은 공장의 제품 관리자이고 두 가지 다른 테스트에서 일부 마이크로 칩에 대한 테스트 결과가 있습니다. 두 가지 테스트를 통해 마이크로 칩을 허용할지 거부할지를 결정해야 합니다. 의사 결정을 돕기 위해 로지스틱 회귀 모델을 구축할 수 있는 과거 마이크로 칩에 대한 테스트 결과 데이터 셋이 있습니다.

또 다른 스크립트 ex2_reg.m 을 완료해야 합니다.

2.1 Visualizing the data

Similar to the previous parts of this exercise, plotData is used to generate a figure like Figure 3, where the axes are the two test scores, and the positive (y = 1, accepted) and negative (y = 0, rejected) examples are shown with different markers.

Figure 3 shows that our dataset cannot be separated into positive and negative examples by a straight-line through the plot. Therefore, a straight- forward application of logistic regression will not perform well on this dataset since logistic regression will only be able to find a linear decision boundary.

그림 3과 같은 그림을 생성하기 우해 plotData를 사용합니다. 두 축은 두 테스트의 점수입니다. 파지티브 예제(y=1, accepted)와 네거티브 예제(y=0, rejected)를 다른 마커로 표시합니다.

그림 3은 데이터 셋을 도식화하면 직선으로 파지티브 예제와 네거티브 예제를 분리할 수 없다는 것을 알려줍니다. 따라서 로지스틱 회귀는 선형 결정 경계만 찾을 수 있기 때문에 로지스틱 회귀를 직접 적용하면 데이터 셋에서 잘 동작하지 않습니다.

<해설>

(1) 데이터 업로드 및 기본 변수 설정

clear; close all; clc % 옥타브 프로그램 초기화

data = load ('ex2data2.txt');

X = [data(:,1:2)];

y = [data(:,3)];

[m, n] = size(X); % 데이터 행렬의 행은 학습 예제의 수 m이고, 열은 피처의 개수 n

(2) plotData.m 파일 확인

지난 실습에서 작성한 plotData.m 파일을 열어서 확인합니다.

function plotData(X, y)

%PLOTDATA Plots the data points X and y into a new figure

% PLOTDATA(x,y) plots the data points with + for the positive examples

% and o for the negative examples. X is assumed to be a Mx2 matrix.

% Create New Figure

figure; hold on;

% ====================== YOUR CODE HERE ======================

% Instructions: Plot the positive and negative examples on a

% 2D plot, using the option 'k+' for the positive

% examples and 'ko' for the negative examples.

positive = find (y==1);

negative = find(y==0);

plot (X(positive,1),X(positive,2),'k+','LineWidth',2);

plot (X(negative,1),X(negative,2), 'ko','MarkerFaceColor','y','MarkerSize',5);

% =========================================================================

hold off;

end

(3) 그래프에 범례와 축의 이름 입력

xlabel ('Microchip Test 1'); % x축의 이름을 입력

ylabel ('Microchip Test 2'); % y축이 이름을 입력

legend('y=1 Accepted', 'y=0 Rejected') % 범례를 작성

(4) 결과 확인

plotData(X,y);

hold on;

xlabel ('Microchip Test 1');

ylabel ('Microchip Test 2');

legend('y=1 Accepted', 'y=0 Rejected')

hold off;

2.2 Feature mapping

One way to fit the data better is to create more features from each data point. In the provided function mapFeature.m, we will map the features into all polynomial terms of x1 and x2 up to the sixth power.

As a result of this mapping, our vector of two features (the scores on two QA tests) has been transformed into a 28-dimensional vector. A logistic regression classifier trained on this higher-dimension feature vector will have a more complex decision boundary and will appear nonlinear when drawn in our 2-dimensional plot.

While the feature mapping allows us to build a more expressive classifier, it also more susceptible to overfitting. In the next parts of the exercise, you will implement regularized logistic regression to fit the data and also see for yourself how regularization can help combat the overfitting problem.

데이터에 더 적합한 가설을 얻는 한 가지 방법은 각 피처를 활용해 더 많은 피처를 만드는 것입니다. mapFeature.m 함수는 6 제곱까지 x1과 x2의 모든 다항식 항으로 매핑합니다.

피처 매핑은 2차원 벡터를 28차원 벡터로 변환합니다. 고차원 피처 벡터로 학습한 로지스틱 회귀 분류기는 더 복잡한 결정 경계를 그리고 2차원 그림을 그릴 때 비선형으로 나타납니다.

피처 매핑은 더 다양한 형태의 분류기를 만들 수 있지만 과적합에 더 취약합니다. 실습에서 데이터에 적합한 로지스틱 회귀를 구현하고 정규화가 과적합 문제를 어떻게 해결하는 지를 확인합니다.

<해설>

(1) mapfeature.m 파일을 확인

코드를 작성할 필요는 없습니다. 합니다.

function out = mapFeature(X1, X2)

% MAPFEATURE 다항식을 만드는 피처 매핑 함수

% MAPFEATURE(X1, X2)는 두 개의 피처 입력을 받아 정규화 실습에 사용할 다항식을 만듦

% 총 28차원의 벡터를 반환 (X1, X2, X1.^2, X2.^2, X1*X2, X1*X2.^2, etc..)

% 피처 X1, X2 은 동일한 차원이어야 함

degree = 6; % 고차 방정식의 차수를 지정

out = ones(size(X1(:,1))); % 반환할 변수 out을 1로 초기화

for i = 1:degree

for j = 0:i

out(:, end+1) = (X1.^(i-j)).*(X2.^j);

end

end

(2) end 명령어의 확인

'end'는 가장 마지막 숫자를 반환

>> P = [1, 2, 3; 4,5,6]

P =

1 2 3

4 5 6

>> P (1, end)

ans = 3

>> P (end,1)

ans = 4

>> P (end, end)

ans = 6

(3) For 루프의 이해

>> X1 = X(:,1); % 변수 X1을 생성

>> X2 = X(:,2); % 변수 X2를 생성

for i = 1:degree % i는 1차원부터 시작해 6차원까지 반복

for j = 0:i % j는 0차원부터 시작해 i까지 반복

out(:, end+1) = (X1.^(i-j)).*(X2.^j);

end

X1과 X2는 각각 118 X1 차원 열 벡터이고, For 루프의 결과인 out은 118 X 28차원 벡터를 만듭니다.

>> size(out)

ans =

118 28

<결과 확인>

(1) 데이터 업로드 및 기본 변수 설정

clear; close all; clc

data = load ('ex2data2.txt');

X = [data(:,1:2)];

y = [data(:,3)];

[m, n] = size(X);

(2) 필요한 변수를 생성

X1 = X(:,1);

X2 = X(:,2);

(3) mapFeature.m을 실행

>> out = mapFeature(X1,X2);

>> size(out)

ans =

118 28

2.3 Cost function and gradient (비용 함수와 기울기)

Now you will implement code to compute the cost function and gradient for regularized logistic regression. Complete the code in costFunctionReg.m to return the cost and gradient.

Recall that the regularized cost function in logistic regression is

Note that you should not regularize the parameter θ0. In Octave/MAT- LAB, recall that indexing starts from 1, hence, you should not be regularizing the theta(1) parameter (which corresponds to θ0) in the code. The gradient of the cost function is a vector where the jth element is defined as follows:

Once you are done, ex2 reg.m will call your costFunctionReg function using the initial value of θ (initialized to all zeros). You should see that the cost is about 0.693.

정규화된 로지스틱 회귀에 대한 비용 함수와 기울기를 계산하는 코드를 구현합니다. costFunctionReg.m의 코드를 완성하여 함수와 기울기를 반환합니다. 로지스틱 회귀에서 정규화된 비용 함수는 다음과 같습니다.

파라미터 theta(0)는 정규화지 않습니다. 옥타브 프로그램에서 1부터 시작하기 때문에 파라미터 theta(1)는 정규화해서는 안됩니다. 비용 함수의 기울기는 j 번째 요소를 다음과 같이 정의합니다.

코드 작성을 완료하면 ex2_reg.m은 초기값 theta(모두 0으로 초기화)를 사용하여 costFunctionReg 함수를 호출합니다. 비용은 약 0.693입니다.

<해설>

(1) 데이터 업로드 및 기본 변수 설정

clear; close all; clc

data = load ('ex2data2.txt');

X = [data(:,1:2)];

y = [data(:,3)];

[m, n] = size(X);

(2) 피처 매핑 함수를 호출하여 다항식으로 매핑할 데이터 생성

X1 = X(:,1);

X2 = X(:,2);

X = mapFeature(X1,X2);

(3) 추가적인 변수 초기화

initial_theta = zeros(size(X, 2), 1); % 28차원의 피처 벡터 X를 0으로 초기화

lambda = 1; % 정규화 변수 람다를 1로 초기화

(4) costFunctionReg.m 파일 열고 비용 함수 식 계산

function [J, grad] = costFunctionReg(theta, X, y, lambda)

%COSTFUNCTIONREG 정규화된 로지스틱 회귀의 비용과 경사를 계산

% J = COSTFUNCTIONREG(theta, X, y, lambda)

% 정규화된 로지스틱 회귀를 위한 파라미터 theta를 활용하여 비용과 경사를 계산

% 학습 데이터 셋의 수 m을 계산

m = length(y); % 학습 예제의 수

% 반환할 변수를 초기화

J = 0;

grad = zeros(size(theta));

% ====================== YOUR CODE HERE ======================

% Instructions: Compute the cost of a particular choice of theta.

% 비용을 위해 J를 설정

% 비용 J를 편미분 하여 grad를 작성

% =============================================================

end

(5) 비용 함수 J를 작성

로지스틱 회귀의 비용 함수와 정규화된 비용 함수는 다음과 같습니다.

옥타브 프로그램에서 로지스틱 회귀의 비용 함수를 구하는 식을 만들었습니다.

z = X * theta;

J = -1/m * (y'*log(sigmoid(z)) + (1-y)'*log(1-sigmoid(z)));

여기에 정규화 항을 추가합니다. 정규화 항은 theta(1)을 제외한 theta(2)부터 적용합니다. 옥타브 프로그램에서 다음과 같이 정규화 항 코드를 적을 수 있습니다.

lambda/(2*m)*theta(2:end)'*theta(2:end);

벡터화 구현으로 생각해야 합니다. theta는 28 X 1차원 벡터이므로 각 성분을 제곱하기 위해서는 theta를 전치하여 theta를 곱합니다. theta'*theta입니다. 그리고, 인터셉트 항 theta(1)을 제외한 나머지 항에 대해서만 정규화 항을 만듭니다. 그리고 'end' 명령어는 인덱스의 마지막 성분을 가리킵니다.

따라서, 다음과 같이 정리할 수 있습니다.

z = X*theta;

J = -1/m*(y'*log(sigmoid(z)) + (1-y)'*log(1-sigmoid(z))) + lambda/(2*m)*theta(2:end)'*theta(2:end);

옥타브 프로그램에 대입하면 다음과 같은 답을 얻습니다.

>> J = 0;

>> grad = zeros(size(theta));

>> initial_theta = zeros(size(X, 2), 1);

>> lambda = 1;

>> theta = initial_theta

>> z = X*theta;

>> J = -1/m*(y'*log(sigmoid(z)) + (1-y)'*log(1-sigmoid(z))) + lambda/(2*m)*tha(2:end)'*theta(2:end);

>> J

J = 0.69315

비용 J = 0.69315입니다.

(6) 기울기 계산하기 위한 미분항을 계산

로지스틱 회귀의 경사 하강법에서 미분항을 구하는 공식은 다음과 같습니다.

옥타브 프로그램에서 로지스틱 회귀의 기울기를 구하는 식을 만들었습니다.

grad = 1/m * (sigmoid(z) - y)'*X;

lambda/m * theta

일반 로지스틱 회귀의 기울기를 구하는 공식에 정규화 항을 추가합니다.

grad = 1/m *(sigmoid(z) - y)'*X + lambda/m * theta;

기존 grad를 계산하면 1 X 28차원 행 벡터이고, 정규화 항은 28 X 1차원 열 벡터입니다. 항상 열 벡터를 중심으로 정리해야 했는 데 여기서 오류가 발생합니다. 그래서, 다음과 같이 수정합니다. 데이터 행렬 X를 전치하고 시그모이드 함수와 자리르 바꿉니다.

grad = 1/m *X'*(sigmoid(z) - y) + lambda/m * theta;

그리고, 인터셉트 항 theta(1)을 제외한 나머지 항에 대해서만 정규화를 진행해야 하지만, 위의 식은 28차원 모든 성분에 대해 진행하였습니다. 따라서, theta(1)은 정규화하지 않은 항으로 계산해야 하기 때문에 다음 항을 추가합니다.

grad(1) = grad(1) - lambda/m*theta;

옥타브 프로그램에 다음을 입력합니다.

grad = zeros(size(theta));

grad = 1/m *X'*(sigmoid(z) - y) + lambda/m * theta;

grad(1) = grad(1) - lambda/m*theta(1);

<정답 >

(1) 데이터 업로드 및 기본 변수 설정

clear; close all; clc

data = load ('ex2data2.txt');

X = [data(:,1:2)];

y = [data(:,3)];

[m, n] = size(X);

(2) 피처 매핑 함수를 호출하여 다항식으로 매핑할 데이터 생성

X1 = X(:,1);

X2 = X(:,2);

X = mapFeature(X1,X2);

(3) 추가적인 변수 초기화

initial_theta = zeros(size(X, 2), 1);

lambda = 1;

(4) costFunctionReg.m 파일 변경

function [J, grad] = costFunctionReg(theta, X, y, lambda)

%COSTFUNCTIONREG 정규화된 로지스틱 회귀의 비용과 경사를 계산

% J = COSTFUNCTIONREG(theta, X, y, lambda)

% 정규화된 로지스틱 회귀를 위한 파라미터 theta를 활용하여 비용과 경사를 계산

% 학습 데이터 셋의 수 m을 계산

m = length(y); % 학습 예제의 수

% 반환할 변수를 초기화

J = 0;

grad = zeros(size(theta));

% ====================== YOUR CODE HERE ======================

% Instructions: Compute the cost of a particular choice of theta.

% 비용을 위해 J를 설정

% 비용 J를 편미분 하여 grad를 작성

z = X*theta;

J = -1/m*(y'*log(sigmoid(z)) + (1-y)'*log(1-sigmoid(z))) + lambda/(2*m)*theta(2:end)'*theta(2:end);

grad = 1/m *X'*(sigmoid(z) - y) + lambda/m * theta;

grad(1) = grad(1) - lambda/m*theta(1);

% =============================================================

end

(5) costFunctionReg.m 함수를 호출

[J, grad] = costFunctionReg(theta, X, y, lambda)

>> [J, grad] = costFunctionReg(theta, X, y, lambda)

J = 0.69315

grad =

8.4746e-03

1.8788e-02

7.7771e-05

5.0345e-02

1.1501e-02

3.7665e-02

1.8356e-02

7.3239e-03

8.1924e-03

2.3476e-02

3.9349e-02

2.2392e-03

1.2860e-02

3.0959e-03

3.9303e-02

1.9971e-02

4.3298e-03

3.3864e-03

5.8382e-03

4.4763e-03

3.1008e-02

3.1031e-02

1.0974e-03

6.3157e-03

4.0850e-04

7.2650e-03

1.3765e-03

3.8794e-02

>> submit

브런치는 최신 브라우저에 최적화 되어있습니다. IE chrome safari