使用Octave软件进行编辑
1.代价函数以及其正则化
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
|
function [J grad] = nnCostFunction(nn_params, ... input_layer_size, ... hidden_layer_size, ... num_labels, ... X, y, lambda) Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ... hidden_layer_size, (input_layer_size + 1)); Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ... num_labels, (hidden_layer_size + 1));%从参数中分离出每一层的参数 m = size(X, 1); J = 0; Theta1_grad = zeros(size(Theta1)); Theta2_grad = zeros(size(Theta2));%初始化为0矩阵 h1 = sigmoid([ones(m,1) X]*Theta1'); h2 = sigmoid([ones(m,1) h1]*Theta2');%向前传播 for i = 1:size(y) tempy = 0; for k = 1:num_labels if(k == y(i)) tempy = 1; else tempy = 0; endif%转化y为向量 J += -tempy*log(h2(i,k)) - (1-tempy)*log(1-h2(i,k)); end end J /= rows(y);%代价函数计算 %正则化代价函数 re = 0; for j =1:rows(Theta1) for k =2:columns(Theta1) re += Theta1(j,k)^2; end end for j =1:rows(Theta2) for k =2:columns(Theta2) re += Theta2(j,k)^2; end end re *= lambda/(2*rows(y)); J += re;%正则化代价函数 |
2.激励函数的导数
|
function g = sigmoidGradient(z) g = zeros(size(z)); g = sigmoid(z).*(1-sigmoid(z)); end |
3.随机初始化参数
|
function W = randInitializeWeights(L_in, L_out) W = zeros(L_out, 1 + L_in); end %调用上面的函数初始化 initial_Theta1 = randInitializeWeights(input_layer_size, hidden_layer_size); initial_Theta2 = randInitializeWeights(hidden_layer_size, num_labels); |
4.向后递推求梯度grad并对其正则化
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
|
vy = zeros(rows(y),num_labels); for i = 1:size(y) vy(i,y(i,:)) = 1; end%向量化y X = [ones(m, 1) X];%添加列,1 z2 = X * Theta1'; d3 = h2 - vy; d2 = d3*Theta2(:,2:end).*sigmoidGradient(z2); delta1 = d2'*X; delta2 = d3'* [ones(m,1) sigmoid(z2)];%求d3,d2,del1,del2 Theta1_grad = delta1/m; Theta2_grad = delta2/m;%求grad1,grad2 Theta1(:,1) = 0; Theta2(:,1) = 0; Theta1 = (lambda/m)*Theta1; Theta2 = (lambda/m)*Theta2; Theta1_grad = Theta1_grad +Theta1; Theta2_grad = Theta2_grad+Theta2;%正则化grad grad = [Theta1_grad(:) ; Theta2_grad(:)]; |
5.进行梯度检查,检查得到的grad是否正确
|
lambda = 3; checkNNGradients(lambda);%期望值 debug_J = nnCostFunction(nn_params, input_layer_size, ... hidden_layer_size, num_labels, X, y, lambda);%实际值 |
6.获取优化参数
[crayon-5ae19e51c1e5529[……]
Read more