Skip to content

A study recording of Coursera's Machine Learning by Andrew Ng, but added some practices for reinforceing learning.

Notifications You must be signed in to change notification settings

luisxiaomai/Coursera-Machine-Learning-and-Practice

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

63 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Coursera-Machine-Learning-and-Practice

A study recording of Coursera's Machine Learning by Andrew Ng, but added some practices for reinforceing learning.

Table of Contents

  1. Week1
  2. Week2
  3. Week3
  4. Week4
  5. Week5
  6. Week6
  7. Week7
  8. Week8
  9. Week9
  10. Week10
  11. Week11

Week1

  • Introduction

    • Machine Learning definition: A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E
    • Supervised learning: "Right answer given" e.g. Regression, Classification...
    • Unsupervised learning: "No right answer given" e.g. Clustering, Gradient descent...
  • Linear Regression with One Variable

    • Model representation
    • Cost function
    • Gradient Descent
  • Linear Algebra Review

  • Python Practice for Simple Linear Regression

    PREDICTING HOUSE PRICES

    We have the following dataset:

    Entry No. Square_Feet Price
    1 150 6450
    2 200 7450
    3 250 8450
    4 300 9450
    5 350 11450
    6 400 15450
    7 600 18450

    With linear regression, we know that we have to find a linearity within the data so we can get θ0 and θ1 Our hypothesis equation looks like this:

    alt text

    Where:

    • hθ(x) is the value price (which we are going to predicate) for particular square_feet (means price is a linear function of square_feet)
    • θ0 is a constant
    • θ1 is the regression coefficient

    Coding:

    • See week1 python codes which used Python Packages for Data Mining like NumPy, SciPy, Pandas, Matplotlib, Scikit-Learn to implement it.
    • Script Output:

    alt text

    PREDICTING WHICH TV SHOW WILL HAVE MORE VIEWERS NEXT WEEK

    The Flash and Arrow are American television series, each one is popular with people. It's interesting that which one will ultimately win the ratings war, so lets write a program which predicts which TV Show will have more viewers.

    We have the following dataset:

    FLASH_EPISODE FLASH_US_VIEWERS ARROW_EPISODE ARROW_US_VIEWERS
    1 4.83 1 2.84
    2 4.27 2 2.32
    3 3.59 3 2.55
    4 3.53 4 2.49
    5 3.46 5 2.73
    6 3.73 6 2.6
    7 3.47 7 2.64
    8 4.34 8 3.92
    9 4.66 9 3.06

    Steps to solving this problem:

    • First we have to convert our data to X_parameters and Y_parameters, but here we have two X_parameters and Y_parameters. So, lets’s name them as flash_x_parameter, flash_y_parameter, arrow_x_parameter , arrow_y_parameter.
    • Then we have to fit our data to two different linear regression models- first for Flash, and the other for Arrow.
    • Then we have to predict the number of viewers for next episode for both of the TV shows.
    • Then we can compare the results and we can guess which show will have more viewers.

    Coding:

    • See week1 python codes which used Python Packages for Data Mining like Pandas, Scikit-Learn to implement it. See
    • Script Output: The flash TV show will have more viewer for next week9.

⬆ back to top

Week2

  • Linear Regression with Multiple Variables

    • Multiple features
    • Gradient Descent for multiple variables
    • Feature scaling: Make sure features are on a similar scale.
    • Learning rate: Making sure gradient descent is working correctly.
      • If α is too small: slow convergence.
      • If α is too large: J(θ) may not decrease on every every iteration; may not converge
    • Features and Polynomial Regression
  • Computing Parameters Analytically

    • Normal equation: Method to solve θ for analytically.
    • Normal equation Noninvertibility
  • Octave Tutorial

    • Basic operation
    • Moving Data Around
    • Computing on Data
    • Plotting Data
    • Control statement: for, while, if statement
    • Vectorization
  • Octave Practice for Linear Regression

    In this practice, we will implement linear regression and get to see it work on data. See related exercises and scripts which used officeal coursera's exercise. All what we did in the exercises are summarised as below sections:

    Linear regression with one variable

    In this part of this exercise, you will implement linear regression with onevariable to predict profits for a food truck. Suppose you are the CEO of a restaurant franchise and are considering different cities for opening a new outlet. The chain already has trucks in various cities and you have data for profits and populations from the cities.

    1. Plotting the Data

    Before starting on any task, it is often useful to understand the data by visualizing it. For this dataset, you can use a scatter plot to visualize the data, since it has only two properties to plot (profit and population).

    alt text

    2. Gradient Descent

    In this part, you will fit the linear regression parameters θ to our dataset using gradient descent.

    The objective of linear regression is to minimize the cost function

    alt text

    where the hypothesis hθ(x) is given by the linear model

    alt text

    Recall that the parameters of your model are the θj values. These arethe values you will adjust to minimize cost J(θ). One way to do this is to use the batch gradient descent algorithm. In batch gradient descent, each iteration performs the update

    alt text

    With each step of gradient descent, your parameters θj come closer to the optimal values that will achieve the lowest cost J(θ).

    So after computing the cost function J(θ) and θ, you can plot the linear fit like below picture :

    alt text

    To understand the cost function J(θ) better, you will now plot the cost over a 2-dimensional grid of θ0 and θ1 values.

    Surface:

    alt text

    Contour:

    alt text

    Linear regression with multiple variables

    In this part, you will implement linear regression with multiple variables to predict the prices of houses. Suppose you are selling your house and you want to know what a good market price would be. One way to do this is to first collect information on recent houses sold and make a model of housing prices.

    1. Feature Normalization

    By looking at the data set avlue, note that house sizes are about 1000 times the number of bedrooms. When features differ by orders of mag-nitude, first performing feature scaling can make gradient descent converge much more quickly.

    • Subtract the mean value of each feature from the dataset.
    • After subtracting the mean, additionally scale (divide) the feature values by their respective \standard deviations."

    2. Gradient Descent

    Previously, you implemented gradient descent on a univariate regression problem. The only difference now is that there is one more feature in the matrix X. The hypothesis function and the batch gradient descent update rule remain unchanged. Also we can try out different learning rates for the dataset and find a learning rate that converges quickly. If you picked a learning rate within a good range, your plot look similar figure like below:

    alt text

    3. Normal Equationst

    In the lecture videos, you learned that the closed-form solution to linear regression is

    alt text

    Using this formula does not require any feature scaling, and you will get an exact solution in one calculation: there is no loop until convergence" like in gradient descent.

⬆ back to top

Week3

  • Classification and Representation

    • Classification:Email:Spam/NotSpam? , Online Transac&ons:Fraudulent(Yes/No)?
    • Hypothesis Representation
    • Decision Boundary
  • Logistic Regression Model

    • Cost Function
    • Simplified Cost Function and Gradient Descent
    • Advanced Optimization
  • Multiclass Classification

    • Multiclass Classification:One-vs-all
  • Regularization

    • The Problem of Overfitting
    • Cost Function
    • Regularized Linear Regression
    • Regularized Logistic Regression
  • Octave Practice for Logistics Regression

    In this exercise, you will implement logistic regression and apply it to two different datasets. See related exercises and scripts which used officeal coursera's exercise. All what we did in the exercises are summarised as below sections:

    Logistic Regression

    In this part of the exercise, you will build a logistic regression model to predict whether a student gets admitted into a university. Suppose that you are the administrator of a university department and you want to determine each applicant's chance of admission based on their results on two exams. You have historical data from previous applicants that you can use as a training set for logistic regression. For each training example, you have the applicant's scores on two exams and the admissions decision. Your task is to build a classification model that estimates an applicant's probability of admission based the scores from those two exams.

    1. Visualizing the data

    Before starting to implement any learning algorithm, it is always good to visualize the data if possible. It can help you get more familiar with the data distribution.

    alt text

    2. Sigmoid function

    Before you start with the actual cost function, recall that the logistic regression hypothesis is defined as:

    alt text

    where function g is the sigmoid function. The sigmoid function is defined as:

    alt text

    3. Cost function and gradient

    Recall that the cost function in logistic regression is

    alt text

    and the gradient of the cost is a vector of the same length as θ where the jth element (for j = 0; 1; : : : ; n) is defined as follows:

    alt text

    4. Learning parameters using fminunc

    In the previous assignment, you found the optimal parameters of a linear regression model by implementing gradent descent. You wrote a cost function and calculated its gradient, then took a gradient descent step accordingly.This time, instead of taking gradient descent steps, you will use an Octave/-MATLAB built-in function called fminunc.

    Octave/MATLAB's fminunc is an optimization solver that fnds the minimum of an unconstrained2 function. For logistic regression, you want to optimize the cost function J(θ) with parameters θ.

    5. Plot the decision boundary

    This fnal θ value will then be used to plot the decision boundary on the training data, resulting in a figure similar to below picture:

    alt text

    Regularized logistic regression

    In this part of the exercise, you will implement regularized logistic regression to predict whether microchips from a fabrication plant passes quality assurance (QA). During QA, each microchip goes through various tests to ensure it is functioning correctly. Suppose you are the product manager of the factory and you have the test results for some microchips on two different tests. From these two tests, you would like to determine whether the microchips should be accepted or rejected. To help you make the decision, you have a dataset of test results on past microchips, from which you can build a logistic regression model.

    1. Visualizing the data

    Before starting to implement any learning algorithm, it is always good to visualize the data if possible. It can help you get more familiar with the data distribution.

    alt text

    It shows that our dataset cannot be separated into positive and negative examples by a straight-line through the plot. Therefore, a straight-forward application of logistic regression will not perform well on this datasetsince logistic regression will only be able to find a linear decision boundary.

    2. Feature mapping

    One way to fit the data better is to create more features from each datapoint.We will map the features into all polynomial terms of x1 and x2 up to the sixth power.

    alt text

    As a result of this mapping, our vector of two features (the scores on two QA tests) has been transformed into a 28-dimensional vector. A logistic regression classifier trained on this higher-dimension feature vector will have a more complex decision boundary and will appear nonlinear when drawn in our 2-dimensional plot.

    3. Cost function and gradient

    Now you will implement code to compute the cost function and gradient for regularized logistic regression.

    Recall that the cost function in logistic regression is:

    alt text

    Note that you should not regularize the parameter θ0. In Octave/MATLAB, recall that indexing starts from 1, hence, you should not be regularizing the theta(1) parameter (which corresponds to θ0) in the code. The gradient of the cost function is a vector where the jth element is defined as follows:

    alt text

    alt text

    4. Learning parameters using fminunc

    Similar to the previous parts, you will use fminunc to learn the optimal parameters θ

    5. Plot the decision boundary

    This fnal θ value will then be used to plot the decision boundary on the training data, resulting in a figure similar to below picture:

    alt text

    ⬆ back to top

Week4

  • Motivations

    • Non-linear Hypothesis
    • Neurons and the Brain
  • Neural Networks

    • Model Representation
  • Application

    • Examples and Intuitions
    • MultiClass classification
  • Octave Practice for Multi-class Classification and Neural Networks

    In this exercise, you will implement one-vs-all logistic regression and neural networks to recognize hand-written digits.See related exercises and scripts which used officeal coursera's exercise. All what we did in the exercises are summarised as below sections:

    Multi-class Classification

    For this exercise, you will use logistic regression and neural networks to recognize handwritten digits (from 0 to 9). Automated handwritten digit recognition is widely used today - from recognizing zip codes (postal codes) on mail envelopes to recognizing amounts written on bank checks.

    1. Dataset and Visualizing the data

    You're given a dataset that contains 5000 training examples of handwritten digits. Each training exaple is a 20 pixel by 20 pixel prayscale image of the digit.Each pixel is represented by a floating point number indicating the grayscale intensity at that location. The 20 by 20 grid of pixels is "unrolled" into a 400-dimensional vector. Each of these training examples becomes a single row in our data matrix X. This gives us a 5000 by 400 matrix X where every row is a training example for a handwritten digit image.

    We will begin by visualizing a subset of the training set, select 100 rows from X and pass those rows to the displayData function. After you run the related codes, you should see an imgae like below:

    alt text

    2. Vectorizing Logistic Regression

    You will be using multiple one-vs-all logistic regression models to build a multi-class classifier. Since there are 10 classes, you will need to train 10 separate logistic regression classifiers. To make this training efficient, it is important to ensure that your code is well vectorized.

    Recall that for regularized logistic regression, the cost function is defined as

    alt text

    Note that you should not be regularizing θ0 which is used for the bias term. Correspondingly, the partial derivative of regularized logistic regression cost for θj is defined as

    alt text

    3. One-vs-all Classification and Prediction

    In this part of the exercise, you will implement one-vs-all classification by training multiple regularized logistic regression classifiers. After training your one-vs-all classifier, you can now use it to predict the digit contained in a given image. For each input, you should compute the "probability" that it belongs to each class using the trained logistic regression classifiers.

    You should see that the training set accuracy is about 94.9% (i.e., it classifies 94.9% of the examples in the training set correctly).

    Neural Networks

    In the previous part of this exercise, you implemented multi-class logistic regression to recognize handwritten digits. However, logistic regression cannot form more complex hypotheses as it is only a linear classifier. In this part of the exercise, you will implement a neural network to recognize handwritten digits using the same training set as before. The neural network will be able to represent complex models that form non-linear hypotheses. For this week, you will be using parameters from a neural network that we have already trained. Your goal is to implement the feedforward propagation algorithm to use our weights for prediction.

    1. Model representation

    Our neural network is shown as below picture. It has 3 layers { an input layer, a hidden layer and an output layer. Recall that our inputs are pixel values of digit images. Since the images are of size 20*20, this gives us 400 input layer units (excluding the extra bias unit which always outputs +1).

    alt text

    2. Feedforward Propagation and Prediction

    Now you will implement feedforward propagation for the neural network.

    You should implement the feedforward computation that computes hθ(x(i))ffor every example i and returns the associated predictions. Similar to the one-vs-all classification strategy, the prediction from the neural network will be the label that has the largest output (hθ(x))k.

    Once you are done, You should see that the accuracy is about 97.5%.

    ⬆ back to top

Week5

Week6

Week7

Week8

Week9

Week10

Week11

About

A study recording of Coursera's Machine Learning by Andrew Ng, but added some practices for reinforceing learning.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published