brunch

You can make anything
by writing

C.S.Lewis

by 라인하트 Jan 12. 2021

머신러닝 옥타브 실습(4-5):신경망손글씨 인식

온라인 강의 플랫폼 코세라의 창립자인 앤드류 응 (Andrew Ng) 교수는 인공지능 업계의 거장입니다. 그가 스탠퍼드 대학에서 머신 러닝 입문자에게 한 강의를 그대로 코세라 온라인 강의 (Coursera.org)에서 무료로 배울 수 있습니다. 이 강의는 머신러닝 입문자들의 필수코스입니다. 인공지능과 머신러닝을 혼자 공부하면서 자연스럽게 만나게 되는 강의입니다.

Programming Exercise 4: Neural Networks Learning

프로그래밍 실습 4 : 신경망 학습

2. Backpropagation (역전파)

2.4 Gradient checking (경사도 검사)

In your neural network, you are minimizing the cost function J(Θ). To perform gradient checking on your parameters, you can imagine “unrolling” the parameters Θ(1), Θ(2) into a long vector θ. By doing so, you can think of the cost function being J(θ) instead and use the following gradient checking procedure.

Suppose you have a function fi(θ) that purportedly computes ∂ J(θ); you’d like to check if fi is outputting correct derivative values.

신경망에서 비용 함수 J(Θ)를 최소화하는 중입니다. 파라미터 행렬 Θ에 대한 경사도 검사(Gradient Checking)를 수행하기 위해서는 파라미터 Θ^(1)과 Θ^(2)를 긴 벡터로 언롤링합니다. 비용 함수를 고려하는 대신에 경사도 검사 프로세스를 사용합니다.

비용 함수 J(Θ)에 대한 미분을 계산하는 함수 fi(θ)가 있을 때 경사도 검사는 fi(θ)가 제대로 계산했는지 아닌 지를 확인합니다.

So, θ(i+) is the same as θ, except its i-th element has been incremented by ε. Similarly, θ(i−) is the corresponding vector with the i-th element decreased by ε. You can now numerically verify fi(θ)’s correctness by checking, for each i, that:

따라서, θ^(i+)는 i 번째 성분이 ε 만큼 파라미터 θ의 값이 증가한 것입니다. 마찬가지로 θ^(i-)는 i 번째 성분이 ε 만큼 파라미터 θ의 값이 감소한 것입니다. 각 i번째 성분에 대해 fi(θ)의 정확성을 수치적으로 확인합니다.

The degree to which these two values should approximate each other will depend on the details of J. But assuming ε = 10−4, you’ll usually find that the left- and right-hand sides of the above will agree to at least 4 significant digits (and often many more).

We have implemented the function to compute the numerical gradient for you in computeNumericalGradient.m. While you are not required to modify the file, we highly encourage you to take a look at the code to understand how it works.

In the next step of ex4.m, it will run the provided function checkNNGradients.m which will create a small neural network and dataset that will be used for checking your gradients. If your backpropagation implementation is correct, you should see a relative difference that is less than 1e-9.

두 값이 서로 비슷한 정도는 세부적인 사항에 따라 달라집니다. ε은 0.0001이라고 가정하면 왼쪽과 오른쪽이 최소 4개 소수점 정도가 유효한 숫자입니다.

computeNumericalGradient.m 파일은 숫자 기울기를 계산하는 함수를 구현하였습니다. 파일을 수정할 필요는 없습니다. 코드를 살펴보고 동작하는 방식을 이해합니다.

ex4.m 파일에서 다음 단계로 제공된 checkNNGradients.m을 실행하고 기울기를 확인하는 데 사용할 작은 신경망과 데이터를 생성합니다. 역전파을 제대로 하였다면 1e-9보다 작은 차이가 표시됩니다.

Practical Tip: When performing gradient checking, it is much more efficient to use a small neural network with a relatively small number of input units and hidden units, thus having a relatively small number of parameters. Each dimension of θ requires two evaluations of the cost function and this can be expensive. In the function checkNNGradients, our code creates a small random model and dataset which is used with computeNumericalGradient for gradient checking. Furthermore, after you are confident that your gradient computations are correct, you should turn off gradient checking before running your learning algorithm.

실전 팁: 경사도 검사는 상대적으로 작은 수의 입력 유닛과 은닉 유닛이 있는 작은 신경망을 사용하는 것이 훨씬 더 효율적입니다. 또한 파라미터 θ의 수마저 상대적으로 적다면 좋습니다. 파라미터 θ의 차원은 비용 함수에 대한 두 가지 평가가 필요하고 비용이 많이 들 수 있습니다. checkNNGradientts 파일은 computeNumericGradient.m 파일이 경사도 검사에 사용할 작은 랜덤 모델과 데이터 셋을 생성합니다. 경사도 계산이 제대로 구현된다는 것이 검증되면 학습 알고리즘을 실행하기 전에 경사도 검사를 꺼야 합니다.

Practical Tip: Gradient checking works for any function where you are computing the cost and the gradient. Concretely, you can use the same computeNumericalGradient.m function to check if your gradient imple- mentations for the other exercises are correct too (e.g., logistic regression’s cost function).

실전 팁 : 경사도 검사는 비용화 기울기를 계산하는 모든 함수에서 작동합니다. 구체적으로 computeNumeticalGradient.m 파일은 다른 실습에서도 올바른지 확인할 수 있습니다. (로지스틱 회귀 함수)

Once your cost function passes the gradient check for the (unregularized) neural network cost function, you should submit the neural network gradient function (backpropagation).

비용 함수가 정규화되지 않은 신경명 비용 함수에 대한 기울기 검사를 통과하면 역전파를 제출합니다.

<해설>

(1) checkNNGradients.m 파일 분석

function checkNNGradients(lambda)

%CHECKNNGRADIENTS 역전파 경사를 검증하기 위한 작은 신경망 생성

% CHECKNNGRADIENTS(lambda)

% lambda : 정규화 파라미터

% 역전파 코드에서 계산한 기울기와 computeNumericalGradiemt.m 파일에서 계산한 기울기를 분석

% 두 기울기는 매우 비슷한 값을 가짐

if ~exist('lambda', 'var') || isempty(lambda) % lamda 변수가 있다면

lambda = 0;

end

input_layer_size = 3; % 입력 계층의 유닛 수 3개

hidden_layer_size = 5; % 은닉 계층의 유닛 수 5개

num_labels = 3; % 출력 계층의 유닛 수 3개

m = 5; % 학습 데이터 셋의 크기를 5로 지정

% 랜덤 테스트를 위한 파라미터 Theta를 생성

Theta1 = debugInitializeWeights(hidden_layer_size, input_layer_size);

Theta2 = debugInitializeWeights(num_labels, hidden_layer_size);

% 학습 데이터 셋 X, y를 생성

X = debugInitializeWeights(m, input_layer_size - 1);

y = 1 + mod(1:m, num_labels)';

% 파라미터를 언롤링(Unrolling)

nn_params = [Theta1(:) ; Theta2(:)];

% 비용 함수를 계산을 위해 nnCostFuntion() 함수 호출

costFunc = @(p) nnCostFunction(p, input_layer_size, hidden_layer_size, ...

num_labels, X, y, lambda);

[cost, grad] = costFunc(nn_params);

numgrad = computeNumericalGradient(costFunc, nn_params);

% 두 개의 기울기를 검사

% 매우 비슷한 두 개의 열을 표시

disp([numgrad grad]);

fprintf(['The above two columns you get should be very similar.\n' ...

'(Left-Your Numerical Gradient, Right-Analytical Gradient)\n\n']);

% 두 솔루션의 차이의 norm() 함수로 평가하라

% computeNumericalGradient.m 파일은 EPSILON = 0.0001을 사용

% 두 솔루션의 차이는 1e-9

diff = norm(numgrad-grad)/norm(numgrad+grad);

fprintf(['If your backpropagation implementation is correct, then \n' ...

'the relative difference will be small (less than 1e-9). \n' ...

'\nRelative Difference: %g\n'], diff);

end

(2) exist() 함수의 이해

exist(NAME, TYPE) 함수는 다음의 값을 반환합니다.

1 : NAME과 같은 이름의 함수가 있는 경우

2 : NAME과 같은 파일명이 있는 경우 (예, NAME.m)

3 : NAME과 같은 파일명이 있는 경우 (예, NAME.oct 또는 NAME.met)

5 : NAME과 같은 내장 함수명이 있는 경우

7 : NAME과 같은 디렉터리가 있는 경우

Type 변수는 다음과 같습니다.

"var" : 변수명만을 확인

"builtin" : 내장 함수명을 확인

"file" : 파일명과 디렉토리명을 확인

"dir" : 디렉토리명만 확인

>> lambda

lambda = 1

>> exist('lambda') % 변수 lambda 가 존재하므로 1을 반환

ans = 1

>> Z

error: 'Z' undefined near line 1 column 1

>> exist('Z') % 변수 Z는 존재하지 않으므로 0을 반환

ans = 0

(3) isempty() 함수의 이해

isempty(A) 함수는 A 변수가 0차원 행렬이면 참인 1의 값을 반환하고, A가 1 이상의 차원을 가지면 거짓인 0을 반환합니다.

>> C = []

C = [](0x0)

>> isempty(C)

ans = 1

>> A = zeros(3,3)

A =

0 0 0

>> isempty(A)

ans = 0

>> A = 0

A = 0

>> isempty(A)

ans = 0

(4) If~ exist 구문의 이해

if ~exist('lambda', 'var') || isempty(lambda)

lambda = 0;

end

exist('lambda', 'var') 함수는 변수 lambda가 함수의 입력 변수로 존재한다면 1의 값을 반환하고, 없다면 0의 값을 반환합니다. isempty(lambda) 함수는 변수 lambda의 값이 함수의 입력 변수로 존재한다면 0일 반환합니다. 그리고 '||'은 논리 연산 AND를 나타냅니다. 따라서 두 변수의 반환 값이 0 || 0 일 경우만 거짓입니다.

~ 표시는 반대의 상황을 의미합니다.

>> lambda = 10

>> exist('lambda', 'var') || isempty(lambda) % 참인 1의 값을 반환

ans = 1

>> ~exist('lambda', 'var') || isempty(lambda) % 거짓인 0의 값을 반환

ans = 0

>> lambda = 10

>> if ~exist('lambda', 'var') || isempty(lambda) % ~로 인해 참이 거짓으로 바뀜

lambda = 0;

end

>> lambda

lambda = 10 % 변수 lambda 변수의 값을 변경하지 않음

>> lambda = 10

lambda = 10

>> if exist('lambda', 'var') || isempty(lambda) % 참인 상황

lambda = 0;

end

>> lambda

lambda = 0 % 변수 lambda 변수의 값을 변경

(4) 파라미터 Theta를 생성을 위해 debugInitializeWeights.m 호출

Theta1 = debugInitializeWeights(hidden_layer_size, input_layer_size);

Theta2 = debugInitializeWeights(num_labels, hidden_layer_size);

Theta1과 Theta2를 생성하기 위해 debugInitializeWeights.m 파일을 호출합니다. 간단한 파일이므로 여기서 살펴보겠습니다.

function W = debugInitializeWeights(fan_out, fan_in)

%DEBUGINITIALIZEWEIGHTS 파라미터 Theta의 인입 계층과 인출 계층을 기준으로 행렬 성분 생성

% W = DEBUGINITIALIZEWEIGHTS(fan_in, fan_out)

% fan_in : 인인 계층의 유닛 수

% fan_out : 인출 계층의 유닛 수

% Note :

% 행렬 W의 차원은 size(1+fan_in, fan_out)로 바이어스 항을 고려

% W의 행렬 성분을 0으로 초기화

W = zeros(fan_out, 1 + fan_in);

% W의 행렬 성분은 디버깅을 위해 같은 값으로 초기화하기 위해 sin() 함수로 초기화

W = reshape(sin(1:numel(W)), size(W)) / 10;

% =========================================================================

end

numel() 은 number of elements의 약어로 행렬 성분의 수를 반환합니다.

>> A = magic(3)

A =

8 1 6

3 5 7

4 9 2

>> size(A)

ans =

3 3

>> numel(A)

ans = 9

(5) mod() 함수의 이해

mod(X,y) 함수는 modulo의 약어로 나눗셈의 나머지를 구합니다.

>> mod(4,2)

ans = 0

>> mod(4,3)

ans = 1

(6) computeNumericalGradient.m 파일 분석

function numgrad = computeNumericalGradient(J, theta)

%COMPUTENUMERICALGRADIENT "finite differences 통한 경사를 계산하고 수치로 추정

% numgrad = COMPUTENUMERICALGRADIENT(J, theta)

% theta에 대한 비용 함수 J의 경사를 계산

% Notes:

다음 경사도 검사를 구현하고 기울기를 반환

% numgrad(i) : i 번째 데이터에 대한 J(Θ)의 편미분

numgrad = zeros(size(theta)); % 비용 함수 J의 편미분인 기울기를 반환

perturb = zeros(size(theta)); % theta와 동일한 크기의 perturb 행렬 생성

e = 1e-4; % 1e-4는 10^(-4) 즉 0.0001을 의미 (앱실론 ε크기 설정)

for p = 1:numel(theta)

% Set perturbation vector

perturb(p) = e;

loss1 = J(theta - perturb);

loss2 = J(theta + perturb);

% Compute Numerical Gradient

numgrad(p) = (loss2 - loss1) / (2*e); % 미분의 근삿값

perturb(p) = 0;

end

end

< 결과 확인 >

3 계층의 신경망을 생성하고 역전파를 활용한 비용 함수 J(Θ)에 대한 미분과 경사도 검사를 실행하여 두 값의 차이를 분석합니다.

>> checkNNGradients

-9.2783e-03 -9.2783e-03

8.8991e-03 8.8991e-03

-8.3601e-03 -8.3601e-03

7.6281e-03 7.6281e-03

-6.7480e-03 -6.7480e-03

-3.0498e-06 -3.0498e-06

1.4287e-05 1.4287e-05

-2.5938e-05 -2.5938e-05

3.6988e-05 3.6988e-05

-4.6876e-05 -4.6876e-05

-1.7506e-04 -1.7506e-04

2.3315e-04 2.3315e-04

-2.8747e-04 -2.8747e-04

3.3532e-04 3.3532e-04

-3.7622e-04 -3.7622e-04

-9.6266e-05 -9.6266e-05

1.1798e-04 1.1798e-04

-1.3715e-04 -1.3715e-04

1.5325e-04 1.5325e-04

-1.6656e-04 -1.6656e-04

3.1454e-01 3.1454e-01

1.1106e-01 1.1106e-01

9.7401e-02 9.7401e-02

1.6409e-01 1.6409e-01

5.7574e-02 5.7574e-02

5.0458e-02 5.0458e-02

1.6457e-01 1.6457e-01

5.7787e-02 5.7787e-02

5.0753e-02 5.0753e-02

1.5834e-01 1.5834e-01

5.5924e-02 5.5924e-02

4.9162e-02 4.9162e-02

1.5113e-01 1.5113e-01

5.3697e-02 5.3697e-02

4.7146e-02 4.7146e-02

1.4957e-01 1.4957e-01

5.3154e-02 5.3154e-02

4.6560e-02 4.6560e-02

The above two columns you get should be very similar.

(Left-Your Numerical Gradient, Right-Analytical Gradient)

If your backpropagation implementation is correct, then

the relative difference will be small (less than 1e-9).

Relative Difference: 2.32978e-11

2.5 Regularized Neural Networks (정규화된 신경망)

After you have successfully implemeted the backpropagation algorithm, you will add regularization to the gradient. To account for regularization, it turns out that you can add this as an additional term after computing the gradients using backpropagation.

Specifically, after you have computed ∆^(l)ij using backpropagation, you should add regularization using

역전파 알고리즘을 성공적으로 구현한 후 비용 함수 J(Θ)에 대한 미분 계산으로 구한 기울기에 정규화 항을 추가합니다. 특히, 역전파를 사용하여 ∆^(l)ij를 계산하고 다음 공식을 이용하여 정규화 항을 추가합니다.

Note that you should not be regularizing the first column of Θ(l) which is used for the bias term. Furthermore, in the parameters Θ^(l)ij, i is indexed starting from 1, and j is indexed starting from 0. Thus, somewhat confusingly, indexing in Octave/MATLAB starts from 1 (for both i and j), thus Theta1(2, 1) actually corresponds to Θ(l) (i.e., the entry in the second row, first column of the matrix Θ(1) shown above)

Now modify your code that computes grad in nnCostFunction to account for regularization. After you are done, the ex4.m script will proceed to run gradient checking on your implementation. If your code is correct, you should expect to see a relative difference that is less than 1e-9.

You should now submit your solutions.

Θ(l)의 첫 번째 열인 바이어스 항은 정규화를 하지 않습니다. 또한, 파라미터 Θ^(l)ij에서 i는 1부터 시작하고 j는 0부터 시작하는 인덱스입니다. 다소 혼란스럽지만 옥타브 프로그램의 인덱싱은 1에서 시작하므로 실제로 Θ(l)의 Theta1(2,1)은 그림과 같은 위치입니다.

정규화를 고려하여 nnCostFunction.m 파일에 grad를 계산하는 코드를 수정합니다. 완료되면 ex4.m 스크립트는 경사도 검사를 실행합니다. 코드가 제대로 구현되었다면 1e-9보다 작은 상대적 차이를 확인할 수 있습니다.

마치면 submit을 입력하고 제출합니다.

<해설>

(1) 데이터 업로드 및 기본 변수 설정

clear; close all; clc

load ('ex4data1.mat'); % 5000X 400의 손글씨 숫자 흑백 이미지 행렬을 업로드

[m, n] = size(X); % 행렬 X가 5000X 400차원일 때 m = 5000, n= 400

(2) 신경망 변수 설정

input_layer_size = 400; % 20x20 이미지를 입력하기 위한 유닛 수

hidden_layer_size = 25; % 25 은닉 유닛의 수

num_labels = 10; % 멀티 클래스의 수, 0은 10으로 처리

lambda = 1; % 정규화 파라미터 λ를 초기화

X = [ones(m, 1), X];

(3) nnCostFunctio.m 파일 분석

function [J grad] = nnCostFunction(nn_params,input_layer_size, ...

hidden_layer_size, num_labels, X, y, lambda)

%NNCOSTFUNCTION 2층 신경망의 비용 함수 구현

% [J grad] = NNCOSTFUNCTON(nn_params, hidden_layer_size, num_labels, ...

% X, y, lambda) 신경망의 비용 J과 기울기 grad를 계산

% 신경망의 파라미터는 벡터 nn_params로 변환했다가 가중치 행렬로 다시 변환

% 변환될 파라미터는 신경망의 편미분에 대한 unrolled 벡터가 변환

% Reshape() 함수로 nn_params를 Theta1과 Theta2로 전환

Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ...

hidden_layer_size, (input_layer_size + 1));

Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ...

num_labels, (hidden_layer_size + 1));

% 변수 초기화

m = size(X, 1);

% 반환할 변수 초기화

J = 0;

Theta1_grad = zeros(size(Theta1));

Theta2_grad = zeros(size(Theta2));

% ====================== YOUR CODE HERE ======================

% Instructions: 다음 두 부분을 코드로 완성

% Part 1: 신경망을 피드 포워드 하고 변수 J에 비용을 반환

% Part 1을 구현한 후 비용 함수 계산이 맞는 지를 확인

% Part 2: 경사를 계산하기 위해 역전파 알고리즘을 구현

% (Theta1_grad과 Theta2_grad에 있는 Theta1과 Theta2의 비용 함수의 편미분을 반환

% Part 2를 구현 후 checkNNGradients를 실행하여 맞는 지를 확인

% 노트: 벡터 y는 1부터 K까지를 포함하는 벡터

% 신경망의 비용 함수에 사용되는 1 또는 0으로 된 이진 벡터

% 힌트: 학습 예제를 대상으로 For 루프를 사용하여 역전파를 구현

% Part 3: 비용 함수와 기울기에 대해 정규화를 구현

% 힌트: 역전파 코드를 활용하여 정규화된 비용 함수와 기울기 구현

% 정규화를 위한 경사를 계산하고 Part 2의 Theta1_grad와 Theta2_grad를 추가

X = [ones(m,1) X];

a2 = sigmoid(X*Theta1');

a2 = [ones(m,1) a2];

a3 = sigmoid (a2 * Theta2');

yVec = zeros(m,num_labels);

temp = eye(num_labels);

yVec = temp(y,:);

%Cost Function

J = - 1/m *sum(sum (yVec .*log(a3) + (1-yVec) .*log(1-a3)));

% Regularized Cost Function

regularizer = lambda/(2*m) * (sum(sum(Theta1(:, 2:end) .* Theta1(:, 2:end))) + sum(sum( Theta2(:, 2:end) .* Theta2(:, 2:end))));

J = J + regularizer;

% Backpropagation

Delta1 = zeros(size(Theta1));

Delta2 = zeros(size(Theta2));

for i = 1:m

xt = X(i, :);

z2t = xt*Theta1';

a2t = sigmoid(z2t);

a2t = [1 a2t];

z3t = a2t * Theta2';

a3t = sigmoid (z3t);

yt = yVec(i,:);

delta3t = a3t - yt;

delta2t = (Theta2(:, 2:end)' * delta3t')' .* (sigmoidGradient(z2t));

Delta1 = Delta1 + delta2t' * xt;

Delta2 = Delta2 + delta3t' *a2t;

end

Theta1_grad = 1/m *Delta1;

Theta2_grad = 1/m *Delta2;

% -------------------------------------------------------------

% =========================================================================

% 기울기를 언롤링(Unroll)

grad = [Theta1_grad(:) ; Theta2_grad(:)];

end

(4) 정규화 항 만들기

인공 신경망의 비용 함수 J(Θ)에 대한 미분으로 계산한 기울기에 정규화 항을 추가합니다. 정규화 항을 구하는 공식은 다음과 같습니다.

파라미터 행렬 Θ의 첫 번째 바이어스항은 정규화하지 않습니다.

(lambda/m)*zeros(size(Theta1,1),1)

(lambda/m)*zeros(size(Theta2,1),1)

정규화 항은 다음과 같습니다.

(lambda/m)*Theta1(:, 2:end)

(lambda/m)*Theta2(:, 2:end)

두 행렬을 합쳐 하나의 Theta 행렬을 만들고 기존의 미분항과 합칩니다.

Theta1_grad = 1/m *Delta1+ (lambda/m)*[zeros(size(Theta1,1),1),Theta1(:, 2:end)];

Theta2_grad = 1/m *Delta2+ (lambda/m)*[zeros(size(Theta2,1),1),Theta2(:, 2:end)];

< 정답>

function [J grad] = nnCostFunction(nn_params,input_layer_size, ...

hidden_layer_size, num_labels, X, y, lambda)

%NNCOSTFUNCTION 2층 신경망의 비용 함수 구현

% [J grad] = NNCOSTFUNCTON(nn_params, hidden_layer_size, num_labels, ...

% X, y, lambda) 신경망의 비용 J과 기울기 grad를 계산

% 신경망의 파라미터는 벡터 nn_params로 변환했다가 가중치 행렬로 다시 변환

% 변환될 파라미터는 신경망의 편미분에 대한 unrolled 벡터가 변환

% Reshape() 함수로 nn_params를 Theta1과 Theta2로 전환

Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ...

hidden_layer_size, (input_layer_size + 1));

Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ...

num_labels, (hidden_layer_size + 1));

% 변수 초기화

m = size(X, 1);

% 반환할 변수 초기화

J = 0;

Theta1_grad = zeros(size(Theta1));

Theta2_grad = zeros(size(Theta2));

% ====================== YOUR CODE HERE ======================

% Instructions: 다음 두 부분을 코드로 완성

% Part 1: 신경망을 피드 포워드 하고 변수 J에 비용을 반환

% Part 1을 구현한 후 비용 함수 계산이 맞는 지를 확인

% Part 2: 경사를 계산하기 위해 역전파 알고리즘을 구현

% (Theta1_grad과 Theta2_grad에 있는 Theta1과 Theta2의 비용 함수의 편미분을 반환

% Part 2를 구현 후 checkNNGradients를 실행하여 맞는 지를 확인

% 노트: 벡터 y는 1부터 K까지를 포함하는 벡터

% 신경망의 비용 함수에 사용되는 1 또는 0으로 된 이진 벡터

% 힌트: 학습 예제를 대상으로 For 루프를 사용하여 역전파를 구현

% Part 3: 비용 함수와 기울기에 대해 정규화를 구현

% 힌트: 역전파 코드를 활용하여 정규화된 비용 함수와 기울기 구현

% 정규화를 위한 경사를 계산하고 Part 2의 Theta1_grad와 Theta2_grad를 추가

X = [ones(m,1) X];

a2 = sigmoid(X*Theta1');

a2 = [ones(m,1) a2];

a3 = sigmoid (a2 * Theta2');

yVec = zeros(m,num_labels);

temp = eye(num_labels);

yVec = temp(y,:);

%Cost Function

J = - 1/m *sum(sum (yVec .*log(a3) + (1-yVec) .*log(1-a3)));

% Regularized Cost Function

regularizer = lambda/(2*m) * (sum(sum(Theta1(:, 2:end) .* Theta1(:, 2:end))) + sum(sum( Theta2(:, 2:end) .* Theta2(:, 2:end))));

J = J + regularizer;

% Backpropagation

Delta1 = zeros(size(Theta1));

Delta2 = zeros(size(Theta2));

for i = 1:m

xt = X(i, :);

z2t = xt*Theta1';

a2t = sigmoid(z2t);

a2t = [1 a2t];

z3t = a2t * Theta2';

a3t = sigmoid (z3t);

yt = yVec(i,:);

delta3t = a3t - yt;

delta2t = (Theta2(:, 2:end)' * delta3t')' .* (sigmoidGradient(z2t));

Delta1 = Delta1 + delta2t' * xt;

Delta2 = Delta2 + delta3t' *a2t;

end

Theta1_grad = 1/m *Delta1+ (lambda/m)*[zeros(size(Theta1,1),1),Theta1(:, 2:end)];

Theta2_grad = 1/m *Delta2+ (lambda/m)*[zeros(size(Theta2,1),1),Theta2(:, 2:end)];

% -------------------------------------------------------------

% =========================================================================

% 기울기를 언롤링(Unroll)

grad = [Theta1_grad(:) ; Theta2_grad(:)];

end

<결과 확인>

정규화를 확인하기 위해 다음 값을 정의합니다.

fprintf('\nChecking Backpropagation (w/ Regularization) ... \n')

lambda = 3;

checkNNGradients(lambda);

Checking Backpropagation (w/ Regularization) ...

-9.2783e-03 -9.2783e-03

8.8991e-03 8.8991e-03

-8.3601e-03 -8.3601e-03

7.6281e-03 7.6281e-03

-6.7480e-03 -6.7480e-03

-1.6768e-02 -1.6768e-02

3.9433e-02 3.9433e-02

5.9336e-02 5.9336e-02

2.4764e-02 2.4764e-02

-3.2688e-02 -3.2688e-02

-6.0174e-02 -6.0174e-02

-3.1961e-02 -3.1961e-02

2.4923e-02 2.4923e-02

5.9772e-02 5.9772e-02

3.8641e-02 3.8641e-02

-1.7370e-02 -1.7370e-02

-5.7566e-02 -5.7566e-02

-4.5196e-02 -4.5196e-02

9.1459e-03 9.1459e-03

5.4610e-02 5.4610e-02

3.1454e-01 3.1454e-01

1.1106e-01 1.1106e-01

9.7401e-02 9.7401e-02

1.1868e-01 1.1868e-01

3.8193e-05 3.8193e-05

3.3693e-02 3.3693e-02

2.0399e-01 2.0399e-01

1.1715e-01 1.1715e-01

7.5480e-02 7.5480e-02

1.2570e-01 1.2570e-01

-4.0759e-03 -4.0759e-03

1.6968e-02 1.6968e-02

1.7634e-01 1.7634e-01

1.1313e-01 1.1313e-01

8.6163e-02 8.6163e-02

1.3229e-01 1.3229e-01

-4.5296e-03 -4.5296e-03

1.5005e-03 1.5005e-03

The above two columns you get should be very similar.

(Left-Your Numerical Gradient, Right-Analytical Gradient)

If your backpropagation implementation is correct, then

the relative difference will be small (less than 1e-9).

Relative Difference: 2.26112e-11

debug_J = nnCostFunction(nn_params, input_layer_size, ...

hidden_layer_size, num_labels, X, y, lambda);

0.576051

(for lambda = 3, this value should be about 0.576051)

브런치는 최신 브라우저에 최적화 되어있습니다. IE chrome safari