▸

One-vs-all logistic regression and neural networks to recognize hand-written digits.

These solutions are for reference only.

>

> But, In case you stuck in between, feel free to refer to the solutions provided by me.





NOTE:

Don't just copy paste the code for the sake of completion.

Even if you copy the code, make sure you understand the code first.

Click here to check out week-3 assignment solutions, Scroll down for the solutions for week-4 assignment.

In this exercise, you will implement one-vs-all logistic regression and neural networks to recognize hand-written digits. Before starting the programming exercise, we strongly recommend watching the video lectures and completing the review questions for the associated topics.





It consist of the following files:

ex3.m - Octave/MATLAB script that steps you through part 1

Octave/MATLAB script that steps you through part 1 ex3 nn.m - Octave/MATLAB script that steps you through part 2

Octave/MATLAB script that steps you through part 2 ex3data1.mat - Training set of hand-written digits

Training set of hand-written digits ex3weights.mat - Initial weights for the neural network exercise

Initial weights for the neural network exercise submit.m - Submission script that sends your solutions to our servers

Submission script that sends your solutions to our servers displayData.m - Function to help visualize the dataset

Function to help visualize the dataset fmincg.m - Function minimization routine (similar to fminunc)

Function minimization routine (similar to fminunc) sigmoid.m - Sigmoid function

Sigmoid function [*] lrCostFunction.m - Logistic regression cost function

Logistic regression cost function [*] oneVsAll.m - Train a one-vs-all multi-class classifier

Train a one-vs-all multi-class classifier [*] predictOneVsAll.m - Predict using a one-vs-all multi-class classifier

Predict using a one-vs-all multi-class classifier [*] predict.m -

Video - YouTube videos featuring Free IOT/ML tutorials

*

indicates files you will need to complete













lrCostFunction.m :

function [J, grad] = lrCostFunction ( theta, X, y, lambda ) %LRCOSTFUNCTION Compute cost and gradient for logistic regression with %regularization % J = LRCOSTFUNCTION(theta, X, y, lambda) computes the cost of using % theta as the parameter for regularized logistic regression and the % gradient of the cost w.r.t. to the parameters. % Initialize some useful values m = length ( y ); % number of training examples % You need to return the following variables correctly J = 0 ; grad = zeros ( size ( theta )); % ====================== YOUR CODE HERE ====================== % Instructions: Compute the cost of a particular choice of theta. % You should set J to the cost. % Compute the partial derivatives and set grad to the partial % derivatives of the cost w.r.t. each parameter in theta % % Hint: The computation of the cost function and gradients can be % efficiently vectorized. For example, consider the computation % % sigmoid(X * theta) % % Each row of the resulting matrix will contain the value of the % prediction for that example. You can make use of this to vectorize % the cost function and gradient computations. % % Hint: When computing the gradient of the regularized cost function, % there're many possible vectorized solutions, but one solution % looks like: % grad = (unregularized gradient for logistic regression) % temp = theta; % temp(1) = 0; % because we don't add anything for j = 0 % grad = grad + YOUR_CODE_HERE (using the temp variable) % %DIMENSIONS: % theta = (n+1) x 1 % X = m x (n+1) % y = m x 1 % grad = (n+1) x 1 % J = Scalar z = X * theta ; % m x 1 h_x = sigmoid ( z ); % m x 1 reg_term = ( lambda / ( 2 * m )) * sum ( theta ( 2 : end ) .^ 2 ); J = ( 1 / m ) * sum (( - y .* log ( h_x )) - (( 1 - y ) .* log ( 1 - h_x ))) + reg_term ; % scalar grad ( 1 ) = ( 1 / m ) * ( X (:, 1 ) '* ( h_x - y )); % 1 x 1 grad ( 2 : end ) = ( 1 / m ) * ( X (:, 2 : end ) '* ( h_x - y )) + ( lambda / m ) * theta ( 2 : end ); % n x 1 % ============================================================= grad = grad (:); end









oneVsAll.m :

function [all_theta] = oneVsAll ( X, y, num_labels, lambda ) %ONEVSALL trains multiple logistic regression classifiers and returns all %the classifiers in a matrix all_theta, where the i-th row of all_theta %corresponds to the classifier for label i % [all_theta] = ONEVSALL(X, y, num_labels, lambda) trains num_labels % logistic regression classifiers and returns each of these classifiers % in a matrix all_theta, where the i-th row of all_theta corresponds % to the classifier for label i % num_labels = No. of output classifier (Here, it is 10) % Some useful variables m = size ( X , 1 ); % No. of Training Samples == No. of Images : (Here, 5000) n = size ( X , 2 ); % No. of features == No. of pixels in each Image : (Here, 400) % You need to return the following variables correctly all_theta = zeros ( num_labels , n + 1 ); %DIMENSIONS: num_labels x (input_layer_size+1) == num_labels x (no_of_features+1) == 10 x 401 %DIMENSIONS: X = m x input_layer_size %Here, 1 row in X represents 1 training Image of pixel 20x20 % Add ones to the X data matrix X = [ ones ( m , 1 ) X ]; %DIMENSIONS: X = m x (input_layer_size+1) = m x (no_of_features+1) % ====================== YOUR CODE HERE ====================== % Instructions: You should complete the following code to train num_labels % logistic regression classifiers with regularization % parameter lambda. % % Hint: theta(:) will return a column vector. % % Hint: You can use y == c to obtain a vector of 1's and 0's that tell you % whether the ground truth is true/false for this class. % % Note: For this assignment, we recommend using fmincg to optimize the cost % function. It is okay to use a for-loop (for c = 1:num_labels) to % loop over the different classes. % % fmincg works similarly to fminunc, but is more efficient when we % are dealing with large number of parameters. % % Example Code for fmincg: % % % Set Initial theta % initial_theta = zeros(n + 1, 1); % % % Set options for fminunc % options = optimset('GradObj', 'on', 'MaxIter', 50); % % % Run fmincg to obtain the optimal theta % % This function will return theta and the cost % [theta] = ... % fmincg (@(t)(lrCostFunction(t, X, (y == c), lambda)), ... % initial_theta, options); % initial_theta = zeros ( n + 1 , 1 ); options = optimset ( 'GradObj' , 'on' , 'MaxIter' , 50 ); for c = 1 : num_labels all_theta ( c ,:) = ... fmincg (@( t )( lrCostFunction ( t , X , ( y == c ), lambda )), ... initial_theta , options ); end % ========================================================================= end









predictOneVsAll.m :

function p = predictOneVsAll ( all_theta, X ) %PREDICT Predict the label for a trained one-vs-all classifier. The labels %are in the range 1..K, where K = size(all_theta, 1). % p = PREDICTONEVSALL(all_theta, X) will return a vector of predictions % for each example in the matrix X. Note that X contains the examples in % rows. all_theta is a matrix where the i-th row is a trained logistic % regression theta vector for the i-th class. You should set p to a vector % of values from 1..K (e.g., p = [1; 3; 1; 2] predicts classes 1, 3, 1, 2 % for 4 examples) m = size ( X , 1 ); % No. of Input Examples to Predict (Each row = 1 Example) num_labels = size ( all_theta , 1 ); %No. of Ouput Classifier % You need to return the following variables correctly p = zeros ( size ( X , 1 ), 1 ); % No_of_Input_Examples x 1 == m x 1 % Add ones to the X data matrix X = [ ones ( m , 1 ) X ]; % ====================== YOUR CODE HERE ====================== % Instructions: Complete the following code to make predictions using % your learned logistic regression parameters (one-vs-all). % You should set p to a vector of predictions (from 1 to % num_labels). % % Hint: This code can be done all vectorized using the max function. % In particular, the max function can also return the index of the % max element, for more information see 'help max'. If your examples % are in rows, then, you can use max(A, [], 2) to obtain the max % for each row. % % num_labels = No. of output classifier (Here, it is 10) % DIMENSIONS: % all_theta = 10 x 401 = num_labels x (input_layer_size+1) == num_labels x (no_of_features+1) prob_mat = X * all_theta ' ; % 5000 x 10 == no_of_input_image x num_labels [ prob , p ] = max ( prob_mat ,[], 2 ); % m x 1 %returns maximum element in each row == max. probability and its index for each input image %p: predicted output (index) %prob: probability of predicted output %%%%%%%% WORKING: Computation per input image %%%%%%%%% % for i = 1:m % To iterate through each input sample % one_image = X(i,:); % 1 x 401 == 1 x no_of_features % prob_mat = one_image * all_theta'; % 1 x 10 == 1 x num_labels % [prob, out] = max(prob_mat); % %out: predicted output % %prob: probability of predicted output % p(i) = out; % end %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%% WORKING %%%%%%%%% % for i = 1:m % RX = repmat(X(i,:),num_labels,1); % RX = RX .* all_theta; % SX = sum(RX,2); % [val, index] = max(SX); % p(i) = index; % end %%%%%%%%%%%%%%%%%%%%%%%%%% % ========================================================================= end

Check-out our free tutorials on IOT (Internet of Things):

















predict.m :

function p = predict ( Theta1, Theta2, X ) %PREDICT Predict the label of an input given a trained neural network % p = PREDICT(Theta1, Theta2, X) outputs the predicted label of X given the % trained weights of a neural network (Theta1, Theta2) % Useful values m = size ( X , 1 ); num_labels = size ( Theta2 , 1 ); % You need to return the following variables correctly p = zeros ( size ( X , 1 ), 1 ); % m x 1 % ====================== YOUR CODE HERE ====================== % Instructions: Complete the following code to make predictions using % your learned neural network. You should set p to a % vector containing labels between 1 to num_labels. % % Hint: The max function might come in useful. In particular, the max % function can also return the index of the max element, for more % information see 'help max'. If your examples are in rows, then, you % can use max(A, [], 2) to obtain the max for each row. % %DIMENSIONS: % theta1 = 25 x 401 % theta2 = 10 x 26 % layer1 (input) = 400 nodes + 1bias % layer2 (hidden) = 25 nodes + 1bias % layer3 (output) = 10 nodes % % theta dimensions = S_(j+1) x ((S_j)+1) % theta1 = 25 x 401 % theta2 = 10 x 26 % theta1: % 1st row indicates: theta corresponding to all nodes from layer1 connecting to for 1st node of layer2 % 2nd row indicates: theta corresponding to all nodes from layer1 connecting to for 2nd node of layer2 % and % 1st Column indicates: theta corresponding to node1 from layer1 to all nodes in layer2 % 2nd Column indicates: theta corresponding to node2 from layer1 to all nodes in layer2 % % theta2: % 1st row indicates: theta corresponding to all nodes from layer2 connecting to for 1st node of layer3 % 2nd row indicates: theta corresponding to all nodes from layer2 connecting to for 2nd node of layer3 % and % 1st Column indicates: theta corresponding to node1 from layer2 to all nodes in layer3 % 2nd Column indicates: theta corresponding to node2 from layer2 to all nodes in layer3 a1 = [ ones ( m , 1 ) X ]; % 5000 x 401 == no_of_input_images x no_of_features % Adding 1 in X %No. of rows = no. of input images %No. of Column = No. of features in each image z2 = a1 * Theta1 ' ; % 5000 x 25 a2 = sigmoid ( z2 ); % 5000 x 25 a2 = [ ones ( size ( a2 , 1 ), 1 ) a2 ]; % 5000 x 26 z3 = a2 * Theta2 ' ; % 5000 x 10 a3 = sigmoid ( z3 ); % 5000 x 10 [ prob , p ] = max ( a3 ,[], 2 ); %returns maximum element in each row == max. probability and its index for each input image %p: predicted output (index) %prob: probability of predicted output % ========================================================================= end

I tried to provide optimized solutions like vectorized implementation for each assignment. If you think that more optimization can be done, then put suggest the corrections / improvements.

-------------------------------------------------------------------------------- Machine Learning Coursera Assignments.

& Click here to see more codes for Raspberry Pi 3 and similar Family. Click here to see solutions for allCoursera Assignments.

& NodeMCU ESP8266 and similar Family.

& Click here to see more codes for Arduino Mega (ATMega 2560) and similar Family.



Click here to see more codes forand similar Family.

Feel free to ask doubts in the comment section. I will try my best to solve it.

If you find this helpful by any mean like, comment and share the post.

This is the simplest way to encourage me to keep doing such work.





Thanks and Regards,

-Akshay P. Daga





I have recently completed the Machine Learning course from Coursera by Andrew NG While doing the course we have to go through various quiz and assignments.Here, I am sharing my solutions for the weekly assignments throughout the course.