brunch

You can make anything
by writing

C.S.Lewis

by 라인하트 Dec 30. 2020

머신러닝 옥타브 실습 (1-2) : 단변수선형회귀(하)

온라인 강의 플랫폼 코세라의 창립자인 앤드류 응 (Andrew Ng) 교수는 인공지능 업계의 거장입니다. 그가 스탠퍼드 대학에서 머신 러닝 입문자에게 한 강의를 그대로 코세라 온라인 강의 (Coursera.org)에서 무료로 배울 수 있습니다. 이 강의는 머신러닝 입문자들의 필수코스입니다. 인공지능과 머신러닝을 혼자 공부하면서 자연스럽게 만나게 되는 강의입니다.

Programing Exercise 1 : Linear Regression

프로그래밍 실습 1 : 선형 회귀

2. Linear regression with one variable

(단변수 선형 회귀)

2.2.4 Gradient descent

Next, you will implement gradient descent in the file gradientDescent.m. The loop structure has been written for you, and you only need to supply the updates to θ within each iteration.

As you program, make sure you understand what you are trying to optimize and what is being updated. Keep in mind that the cost J(θ) is parame- terized by the vector θ, not X and y. That is, we minimize the value of J(θ) by changing the values of the vector θ, not by changing Xor y. Refer to the equations in this handout and to the video lectures if you are uncertain.

A good way to verify that gradient descent is working correctly is to look at the value of J(θ) and check that it is decreasing with each step. The starter code for gradientDescent.m calls computeCost on every iteration and prints the cost. Assuming you have implemented gradient descent and computeCost correctly, your value of J(θ) should never increase, and should converge to a steady value by the end of the algorithm.

After you are finished, ex1.m will use your final parameters to plot the linear fit. The result should look something like Figure 2:

Your final values for θ will also be used to make predictions on profits in areas of 35,000 and 70,000 people. Note the way that the following lines in ex1.m uses matrix multiplication, rather than explicit summation or loop- ing, to calculate the predictions. This is an example of code vectorization in Octave/MATLAB.

gradientDescent.m 파일에서 경사 하강법을 구현합니다. 파일에 이미 For 루프 구조가 작성되어 있고, 반복할 때마다 θ를 업데이트를 하도록 코드를 작성합니다.

프로그래밍을 할 때 최적화 목표와 업데이트 항목을 이해해야 합니다. 비용 함수 J(θ)는 피처 행렬 X와 실제값 y가 아닌 파라미터 벡터 θ를 사용합니다. 즉, X 또는 y는 고정된 데이터이고 파라미터 벡터 θ의 값의 변화에 따라 J(θ)의 값을 최소화합니다.

경사 하강법이 올바르게 작동하는지 확인하는 방법은 J(θ)의 값을 보고 각 단계에서 감소하는지 확인하는 것입니다. gradientDescent.m의 시작 코드는 모든 반복에서 computeCost를 호출하고 비용을 출력합니다. 경사 하강법과 computeCost를 올바르게 구현했다면 J(θ) 값은 절대로 증가하지 않고 일정한 값으로 수렵합니다.

파라미터 θ에 대한 최종 값은 35,000 및 70,000 명의 지역에서 수익을 예측할 수 있습니다. ex1.m의 다음 줄이 예측을 위한 명시적 합산이나 반복이 아닌 행렬 곱셈을 사용하는 방식에 유의하십시오. 이것은 옥타브와 매트랩에서 코드 벡터화 예입니다.

predict1 = [1, 3.5] * theta;
predict2 = [1, 7] * theta;

2.3 Debugging

Here are some things to keep in mind as you implement gradient descent:

Octave/MATLAB array indices start from one, not zero. If you’re stor- ing θ0 and θ1 in a vector called theta, the values will be theta(1) and theta(2).

If you are seeing many errors at runtime, inspect your matrix operations to make sure that you’re adding and multiplying matrices of compat- ible dimensions. Printing the dimensions of variables with the size command will help you debug.

By default, Octave/MATLAB interprets math operators to be matrix operators. This is a common source of size incompatibility errors. If you don’t want matrix multiplication, you need to add the “dot” notation to specify this to Octave/MATLAB. For example, A*B does a matrix multiply, while A.*B does an element-wise multiplication.

여기 경사 하강법 구현 시 유의할 점이 있습니다.

옥타브 프로그램은 배열 인덱스를 0 이 아닌 1부터 시작합니다. θ0와 θ1 은 실제로 theta(1)과 theta(2)의 값입니다.

런타임 오류가 발생할 경우 행렬의 곱셈을 확인하십시오. size() 명령어는 행렬의 차원을 반환하므로 디버깅에 큰 도움이 됩니다.

옥타브 프로그램은 기본적으로 수학 연산자를 행렬 연산자로 해석합니다. 행렬 곱셈을 원하지 않는 경우 점 표기법을 추가해야 합니다. A*B는 행렬 곱셈이고, A .*B 는 행렬 성분별 곱셈입니다.

<정답>

1) 필요한 변수를 정의합니다.

X = [ones(m, 1), data(:,1)]; % 변수 X의 데이터 오른쪽에 1을 추가

y = data(:, 2); % 변수 data의 두 번째 열을 벡터 y로 입력

theta = zeros(2, 1); % 파라미터 θ0, θ1을 0으로 초기화 theta = [0;0]; 과 동일

iterations = 1500; % 경사 하강법 반복 횟수 1,500번 설정

alpha = 0.01; % 학습률 α를 0.01로 설정

2) GradientDescent.m 파일을 엽니다.

function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)

%GRADIENTDESCENT Performs gradient descent to learn theta

% theta = GRADIENTDESCENT(X, y, theta, alpha, num_iters) updates theta by

% taking num_iters gradient steps with learning rate alpha

% Initialize some useful values

m = length(y); % number of training examples

J_history = zeros(num_iters, 1);

for iter = 1:num_iters

% ====================== YOUR CODE HERE ======================

% Instructions: Perform a single gradient step on the parameter vector

% theta.

% Hint: While debugging, it can be useful to print out the values

% of the cost function (computeCost) and gradient here.

% ============================================================

% Save the cost J in every iteration

J_history(iter) = computeCost(X, y, theta);

end

3) 경사 하강법 공식을 입력

function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)

% GRADIENTDESCENT 파일은 최적의 파라미터 θ 값을 학습

% theta = GradientDescent(X, y, theta, alpha, num_iters)는

% 변수 alpha는 학습률, num_iters는 경사 하강 알고리즘 반복 회수, theta는 동시 업데이트

% 변수를 초기화

m = length(y); % 학습 예제의 총 개수

J_history = zeros(num_iters, 1); % 경사 하강 법을 반복할 때마다 계산한 비용 함수 값을 저장

% 기본 값으로 0으로 저장

for iter = 1:num_iters % For 루프를 num_iters 만큼 반복

% ====================== YOUR CODE HERE ======================

% Instructions: Perform a single gradient step on the parameter vector

% 파라미터 벡터 theta를 단일 경사 하강 스텝을 수행하시오

% Hint: 디버깅하는 동안 비용 함수(computeCost)와 경사를 여기에 출력하는 것이 유용

Update = 0;

for i =1:m;

Update = Update + alpha/m*(X(i,:)*theta - y(i))*X(i,:)';

end

theta = theta - Update;

% ============================================================

% 매 반복마다 비용 J를 저장

J_history(iter) = computeCost(X, y, theta);

end

<해설>

선형 회귀 가설과 비용 함수, 그리고 경사 하강법 업데이트 공식은 다음과 같습니다.

옥타브 프로그램에 맞게 경사 하강 업데이트 공식을 정리합니다. 우선, 단일 경사 하강을 수행하는 for 루프를 작성하기 위해 경사 하강 업데이트 공식을 계산합니다.

(1) 가설 hθ(x)를 구함

데이터 X는 다음과 같습니다.

파라미터 theta는 다음과 같습니다.

가설 hθ(x)를 행렬 형식으로 계산하기 위한 방법을 정리합니다. X는 97 X 2 행렬이고, theta는 1 X 2 벡터입니다. 두 행렬을 곱하기 위해서는 왼쪽 행렬이 m X n 차원일 때 오른쪽 행렬은 n X () 행렬이어야 합니다. 이렇게 조합을 할 수 있는 방법은 두 가지입니다. 이해하기 위해 X의 첫 번째 행만 있는 X(1,:)은 1 X 2 행렬이고, theta는 2X 1 행렬입니다.

theta' * X(1, :)' % 전치 행렬 theta'은 1 X 2 행렬. 전치 행렬 X(1,:)'은 2 X 1 행렬

X(1,:) * theta % X(1,:)는 1 X 2 행 벡터 , theta는 2 X 1 벡터는 열 벡터

따라서, 두 식을 모두 사용할 수 있다.

(2) 모든 학습 예제에 대해 적용

(X(1,:) * theta - y(1))* X(1,:); % 결과값은 1 X 2 행 벡터

(X(1,:) * theta - y(1))* X(1,:)'; % 결과값은 2 X 1 열 벡터

경사 하강 업데이트의 theta는 2 X 1 벡터입니다. 여기에 맞추어야 덧셈과 뺄셈을 수월하게 할 수 있습니다. 지금까지 첫 번째 학습 예제에 대해서 다루었습니다. 97개의 모든 학습 예제에 대해 다루기 위해서 For 루프를 사용하고 완전한 δ(델타)를 식으로 적습니다. 그리고, 경사 하강 업데이트도 적습니다.

Update = 0;

For i = 1:m

Update = Update + (X(i,:) * theta - y(i))* X(i,:)'

end

Update = Update * alpha/m

theta = theta - Update;

여기서, 변수 Update는 순차적으로 학습 예제에 대한 값을 계산할 때마다 값들을 모두 합산합니다. 그리고, 아래와 같이 표현할 수 있습니다.

Update = 0;

For i =1:m;

Update = Update + alpha/m*(X(i,:)*theta - y(i))*X(i,:)';

end

theta = theta - Update;

두 개의 코딩의 차이는 경사 하강 업데이트 공식을 변형했기 때문입니다.

(3) 벡터화 구현

위의 방식으로 for 루프를 이용하는 것이 훨씬 복잡합니다. 간단하게 아래와 같이 구현할 수 있습니다.

Delta = 1/m * (X'*(X*theta - y));

theta = theta - alpha*Delta;

먼저 (델타)를 계산하는 코드를 작성합니다. 데이터 행렬 X는 97×2 차원이고, 파라미터 벡터 theta는 2×1 차원입니다. 또한 벡터 y는 97×1 차원이므로 가설의 결과 벡터는 97×1 차원이여야 합니다. 벡터화 구현에서 행렬의 차원을 맞추는 것은 아주 중요합니다. 가설 predictions는 97×1 차원 행렬입니다. 벡터 y와 뺄셈을 합니다. 따라서, 가설은 X* theta 입니다.

여기서 (X*theta – y)는 97×1 차원 행렬이고, 는 97×2 차원 피처 행렬 X입니다. 행렬 곱셈의 결과는 2×1 차원 파라미터 벡터 theta와 같아야 합니다. 피처 행렬 X를 전치하면 2×97 차원 행렬이므로 97×1 차원 행렬과 행렬 곱셈을 하여 2×1 차원 행렬을 얻습니다. 마지막으로 으로 나눕니다. 이것이 Delta를 구하는 코드입니다.

<결과>

다음과 같이 명령어를 'gradientDescent(X,y, theta, alpha, 1)'를 넣어서 결과를 확인합니다.

2.4 Visualizing J(θ)

To understand the cost function J(θ) better, you will now plot the cost over a 2-dimensional grid of θ0 and θ1 values. You will not need to code anything new for this part, but you should understand how the code you have written already is creating these images.

In the next step of ex1.m, there is code set up to calculate J(θ) over a grid of values using the computeCost function that you wrote.

% initialize J vals to a matrix of 0's
J_vals = zeros(length(theta0_vals), length(theta1_vals));

% Fill out J vals
    for i = 1:length(theta0 vals)
        for j = 1:length(theta1 vals)
              t = [theta0 vals(i); theta1 vals(j)];
              J vals(i,j) = computeCost(x, y, t);
         end
     end

After these lines are executed, you will have a 2-D array of J(θ) values. The script ex1.m will then use these values to produce surface and contour plots of J(θ) using the surf and contour commands. The plots should look something like Figure 3:

The purpose of these graphs is to show you that how J(θ) varies with changes in θ0 and θ1. The cost function J(θ) is bowl-shaped and has a global mininum. (This is easier to see in the contour plot than in the 3D surface plot). This minimum is the optimal point for θ0 and θ1, and each step of gradient descent moves closer to this point.

비용 함수 J(θ)를 제대로 이해하기 위해 파라미터 θ0와 θ1의 값을 2차원 평면에 그립니다. 새로운 코드를 작성할 필요는 없지만 이미 작성한 코드로 이미지를 만드는 방법을 이해합니다.

ex1.m의 다음 단계에서 작성한 computeCost 함수를 사용하여 그리드 J(θ)를 계산하도록 설정된 코드가 있습니다.

위의 명령어를 실행하면 J(θ) 값을 2D 배열이 생성됩니다. 스크립트 ex1.m은 이러한 값을 사용하여 surf 및 contour 명령어를 사용하여 J(θ)의 표면 및 윤곽을 도식화합니다.

그래프의 목적은 비용 함수 J(θ)가 θ0와 θ1의 변화에 따라 어떻게 변하는지 보여주는 것입니다. 비용 함수 J(θ)는 그릇 모양이며 전역 최소값을 갖습니다. 최소값은 θ0와 θ1에 대한 최적의 점입니다. 경사 하강 법의 각 스텝은 점점 더 가깝게 이동합니다.

<정답>

(1) ex1 함수를 호출합니다.

(2) submit 함수를 호출

<해설>

ex1.m 파일을 이해하면서 그래프를 그려봅니다.

(1) 모든 것을 초기화

%% 초기화

clear ; % 옥타브 프로그램에 있는 모든 변수를 초기화

close all; % 옥타브 프로그램에서 열린 모든 창을 닫음

clc % 터미널 창을 깨끗이 정리

(2) warmUpExercise.m 파일을 자동 실행 ( 5 X 5 항등 행렬 표시)

%% ==================== Part 1: Basic Function ====================

% Complete warmUpExercise.m

fprintf('Running warmUpExercise ... \n'); % 화면에 메시지 표시

fprintf('5x5 Identity Matrix: \n'); % 화면에 메시지 표시

warmUpExercise() % 'wrmUpExercise() 실행

fprintf('Program paused. Press enter to continue.\n'); % 화면에 메시지 표시

pause; % 엔터 키를 누를 때까지 대기

% ================================================================

결과는 다음과 같습니다.

(3) PlotData(X,y) 함수를 자동으로 실행

%% ======================= Part 2: Plotting =======================

fprintf('Plotting Data ...\n')

data = load('ex1data1.txt'); % ex1data1.txt 파일을 업로드

X = data(:, 1); y = data(:, 2); % 변수 X와 y를 생성 및 데이터 입력

m = length(y); % 학습 예제의 수를 계산

% Plot Data

% Note: You have to complete the code in plotData.m

plotData(X, y); % 'plotData(X,y) 함수를 호출

fprintf('Program paused. Press enter to continue.\n');

pause;

%% =================================== =======================

(4) 비용 함수 J를 계산

%% ========== Part 3: Cost and Gradient descent ===================

X = [ones(m, 1), data(:,1)]; % 항상 1의 값을 가지는 x0 피처 추가

theta = zeros(2, 1); % 파라미터 theta의 두 성분을 0으로 초기화

% 경사 하강 알고리즘 상수 세팅

iterations = 1500; % 경사 하강 알고리즘 반복 회수

alpha = 0.01; % 초기 학습률 세팅

fprintf('\nTesting the cost function ...\n') % '비용함수 테스트 결과' 화면 표시

J = computeCost(X, y, theta); % computeCost(X, y, theta) 함수 실행

fprintf('With theta = [0 ; 0]\nCost computed = %f\n', J);

fprintf('Expected cost value (approx) 32.07\n');

% 추가 테스트

J = computeCost(X, y, [-1 ; 2]); % computeCost(X, y, [-1;2]) 함수 실행

fprintf('\nWith theta = [-1 ; 2]\nCost computed = %f\n', J);

fprintf('Expected cost value (approx) 54.24\n');

fprintf('Program paused. Press enter to continue.\n');

pause;

%======================================================================

(5) 경사 하강 알고리즘 실행

%% ========== Part 3: Cost and Gradient descent ===================

fprintf('\nRunning Gradient Descent ...\n')

% gradientDsecent() 함수 실행

theta = gradientDescent(X, y, theta, alpha, iterations);

% theta를 화면에 표시

fprintf('Theta found by gradient descent:\n');

fprintf('%f\n', theta);

fprintf('Expected theta values (approx)\n');

fprintf(' -3.6303\n 1.1664\n\n');

% 선형 회귀를 도식화

hold on; % 기존 창에 그래프를 그림

plot(X(:,2), X*theta, '-') %X 축에 데이터 값, Y축에 가설을 도식화

legend('Training data', 'Linear regression')

hold off % 현재 그림에 중복해서 그리지 말 것

%======================================================================

여기서, plot(X(:,2), X*theta, '-') 은 매우 다양한 옵션이 있습니다. 선의 모양을 다음과 같이 정의할 수 있습니다.

- 직선

-- 대시선

: 점선

-. 대시와 점을 혼용

데이터와 함께 가설을 표현할 수 있습니다. 'Hold on' 명령어를 사용하여 기존 그림 창에 계속 그릴 수 있게 하고, 'hold off' 명령어는 중지하고 새로운 창에 그림을 그리게 합니다.

plot (X(:,2),y)

hold on

plot (X(:,2),X*theta)

hold off

(6) 가설을 활용한 예측

%% ========== Part 3: Cost and Gradient descent ===================

% 인구가 35,000와 70,000 일 때 수익을 예측

predict1 = [1, 3.5] *theta; % 인구 35,000 일 때 수익 예측

fprintf('For population = 35,000, we predict a profit of %f\n',... predict1*10000);

predict2 = [1, 7] * theta; %인구 70,000 일 때 수익 예측

fprintf('For population = 70,000, we predict a profit of %f\n',...predict2*10000);

fprintf('Program paused. Press enter to continue.\n');

pause;

%======================================================================

(7) 비용 함수 J(θ)의 모든 값을 구함

%% ============= Part 4: Visualizing J(theta_0, theta_1) =============

fprintf('Visualizing J(theta_0, theta_1) ...\n')

% 비용 함수 J를 계산하여 격자구조를 만듦

theta0_vals = linspace(-10, 10, 100); % -10에서 10까지 100개로 분할

theta1_vals = linspace(-1, 4, 100); % -1에서 4까지 범위를 100개로 분할

% 변수 J_vals을 0으로 초기화 (J_vals는 100 X 100 행렬)

J_vals = zeros(length(theta0_vals), length(theta1_vals));

% J_vals은 100 X 100 행렬이므로 모든 값을 채우기 위해 For 루프 사용

for i = 1:length(theta0_vals)

for j = 1:length(theta1_vals)

t = [theta0_vals(i); theta1_vals(j)]; % 파라미터 세타의 값으로 t 벡터 생성

J_vals(i,j) = computeCost(X, y, t); % J_vals 행렬에 비용 함수 결과 값을 입력

end

%================================================================

linspace() 명령어를 이해하기 쉽게 다음과 같이 명령어를 사용하였습니다.

C = linspace (-10,10,100)

여기서 행렬 J_vals 은 100 X 100차원 행렬이고, 파라미터 벡터 θ의 값을 변경하면서 계산한

비용 함수 J(θ)의 값을 J_vals(i, j) 성분에 입력합니다.

(8) 비용 함수 J(θ)를 격자 구조로 도식화

%% ============= Part 4: Visualizing J(theta_0, theta_1) =============

% suft 명령어에서 복잡한 격자구조가 작동하기 때문에 surf 명령어를 호출하기 전에

% J_vals을 전치하지 않으면 축이 뒤집힐 것입니다.

J_vals = J_vals'; % J_vals를 전치

% surf 명령어로 그래프를 도식화

figure; % 그림 창을 열기

surf(theta0_vals, theta1_vals, J_vals) % surf로 그래프를 그림

xlabel('\theta_0'); ylabel('\theta_1'); % xlabel 을 지정

%================================================================

우선 J_vals를 전치한 경우와 전치하지 않은 경우를 살펴보았습니다. 왼쪽은 전치를 한 것이고, 오른쪽 그림은 전치를 하지 않았습니다. 좌우가 뒤바뀐 것을 확인할 수 있습니다.

surf(theta0_vals, theta1_vals, J_vals)

surf는 3D로 데이터를 도시화합니다. surf(X, Y, Z)입니다. X축은 theta0_vals이고, Y축은 theta1_vals입니다. 높이를 나타내는 Z 축은 J_vals입니다. theta0_vals는 1 X 100 행렬이고, theta1_vals는 1X 100 행렬입니다. J_vals는 100 X 100 행렬입니다. 처음 For 루프로 데이터를 쌓을 때를 보시면 쉽게 J_vals를 전치하는 이유를 알 것입니다.

(9) 등고선 그래프 그리기

%% ============= Part 4: Visualizing J(theta_0, theta_1) =============

% 등고선 그래프 그리기

figure;

% 0.01에서 100 사이에서 로그 간격으로 15개의 등고선을 그림

contour(theta0_vals, theta1_vals, J_vals, logspace(-2, 3, 20)) % 등고선을 그림

xlabel('\theta_0'); ylabel('\theta_1');

hold on;

plot(theta(1), theta(2), 'rx', 'MarkerSize', 10, 'LineWidth', 2);

%================================================================

contour(theta0_vals, theta1_vals, J_vals)

contour 명령어는 등고선 그래프를 그립니다. X축은 theta0_vals이고, Y축은 theta1_vals, Z 축은 J_vals입니다.