Neural Networks Learning

Posted on 2018-01-11 Edited on 2023-09-29 Disqus:

Symbols count in article: 1.2k Reading time ≈ 1 mins.

First of all, I have to say that this is the most difficult exercise I have done since starting this course.

1.3 Feedforward and Cost Function

a1 = [ones(m, 1), X];
z2 = a1 * Theta1';
a2 = [ones(m, 1), sigmoid(z2)];
z3 = a2 * Theta2';
a3 = sigmoid(z3);

I = eye(num_labels);
Y = I(y, :);

J = sum(sum((-Y.*log(a3) - (1-Y).*log(1-a3) ) / m));

Currently, I am confused about I(y,:). How can we make y(5000:1) become Y(5000:10), where every row has matching indices?

1.4 Regularized Cost Function

$$ + \frac{\lambda}{2m}\left[\sum_{j=1}^{25}\sum_{k=1}^{400}(\Theta {j,k}^{(1)})^{2} + \sum{j=1}^{10}\sum_{k=1}^{25}(\Theta _{j,k}^{(2)})^{2}\right] $$

1 2	r = lambda/2/m * (sum(sum(Theta1(:,2:end).^2)) + sum(sum(Theta2(:,2:end).^2))); J = J + r;

2.1 Sigmoid Gradient

1	g = sigmoid(z).*(1-sigmoid(z));

2.3 Backpropagation

$$ \delta {k}^{(3)} = (a{k}^{(3)} - y_{k}) $$

$$ \frac{\partial}{\partial \Theta {ij}^{(l)}} J(\Theta) = D{ij}^{(l)} = \frac{1}{m}\Delta _{ij}^{(l)} $$

d3 = a3-Y;
d2 = d3*Theta2.*[ones(m, 1), sigmoidGradient(z2)];

D1 = d2(:,2:end)'*a1;
D2 = d3'*a2;

Theta1_grad = Theta1_grad + D1/m;
Theta2_grad = Theta2_grad + D2/m;