Neural Networks Learning

First of all, I have to say that this is the most difficult exercise I have done since starting this course.

1.3 Feedforward and Cost Function

1
2
3
4
5
6
7
8
9
10
a1 = [ones(m, 1), X];
z2 = a1 * Theta1';
a2 = [ones(m, 1), sigmoid(z2)];
z3 = a2 * Theta2';
a3 = sigmoid(z3);

I = eye(num_labels);
Y = I(y, :);

J = sum(sum((-Y.*log(a3) - (1-Y).*log(1-a3) ) / m));

Currently, I am confused about I(y,:). How can we make y(5000:1) become Y(5000:10), where every row has matching indices?

1.4 Regularized Cost Function

$$ + \frac{\lambda}{2m}\left[\sum_{j=1}^{25}\sum_{k=1}^{400}(\Theta {j,k}^{(1)})^{2} + \sum{j=1}^{10}\sum_{k=1}^{25}(\Theta _{j,k}^{(2)})^{2}\right] $$

1
2
r = lambda/2/m * (sum(sum(Theta1(:,2:end).^2)) + sum(sum(Theta2(:,2:end).^2)));
J = J + r;

2.1 Sigmoid Gradient

1
g = sigmoid(z).*(1-sigmoid(z));

2.3 Backpropagation

$$ \delta {k}^{(3)} = (a{k}^{(3)} - y_{k}) $$

$$ \frac{\partial}{\partial \Theta {ij}^{(l)}} J(\Theta) = D{ij}^{(l)} = \frac{1}{m}\Delta _{ij}^{(l)} $$

1
2
3
4
5
6
7
8
d3 = a3-Y;
d2 = d3*Theta2.*[ones(m, 1), sigmoidGradient(z2)];

D1 = d2(:,2:end)'*a1;
D2 = d3'*a2;

Theta1_grad = Theta1_grad + D1/m;
Theta2_grad = Theta2_grad + D2/m;

2.5 Regularized Neural Networks

1
2
Theta1_grad(:,2:end) = Theta1_grad(:,2:end) + lambda/m*Theta1(:,2:end);
Theta2_grad(:,2:end) = Theta2_grad(:,2:end) + lambda/m*Theta2(:,2:end);

The hidden layer

Translated by gpt-3.5-turbo