diff --git a/02_pytorch_classification.ipynb b/02_pytorch_classification.ipynb index 4ab4cb88..b86f7ee0 100644 --- a/02_pytorch_classification.ipynb +++ b/02_pytorch_classification.ipynb @@ -29,8 +29,8 @@ "| Problem type | What is it? | Example |\n", "| ----- | ----- | ----- |\n", "| **Binary classification** | Target can be one of two options, e.g. yes or no | Predict whether or not someone has heart disease based on their health parameters. |\n", - "| **Multi-class classification** | Target can be one of more than two options | Decide whether a photo of is of food, a person or a dog. |\n", - "| **Multi-label classification** | Target can be assigned more than one option | Predict what categories should be assigned to a Wikipedia article (e.g. mathematics, science & philosohpy). |\n", + "| **Multi-class classification** | Target can be one of more than two options | Decide whether a photo is of food, a person or a dog. |\n", + "| **Multi-label classification** | Target can be assigned more than one option | Predict what categories should be assigned to a Wikipedia article (e.g. mathematics, science & philosophy). |\n", "\n", "
\n", "\"various\n", @@ -53,7 +53,7 @@ "source": [ "## What we're going to cover\n", "\n", - "In this notebook we're going to reiterate over the PyTorch workflow we coverd in [01. PyTorch Workflow](https://www.learnpytorch.io/02_pytorch_classification/).\n", + "In this notebook we're going to reiterate over the PyTorch workflow we covered in [01. PyTorch Workflow](https://www.learnpytorch.io/02_pytorch_classification/).\n", "\n", "\"a\n", "\n", @@ -68,7 +68,7 @@ "| **2. Building a PyTorch classification model** | Here we'll create a model to learn patterns in the data, we'll also choose a **loss function**, **optimizer** and build a **training loop** specific to classification. | \n", "| **3. Fitting the model to data (training)** | We've got data and a model, now let's let the model (try to) find patterns in the (**training**) data. |\n", "| **4. Making predictions and evaluating a model (inference)** | Our model's found patterns in the data, let's compare its findings to the actual (**testing**) data. |\n", - "| **5. Improving a model (from a model perspective)** | We've trained an evaluated a model but it's not working, let's try a few things to improve it. |\n", + "| **5. Improving a model (from a model perspective)** | We've trained and evaluated a model but it's not working, let's try a few things to improve it. |\n", "| **6. Non-linearity** | So far our model has only had the ability to model straight lines, what about non-linear (non-straight) lines? |\n", "| **7. Replicating non-linear functions** | We used **non-linear functions** to help model non-linear data, but what do these look like? |\n", "| **8. Putting it all together with multi-class classification** | Let's put everything we've done so far for binary classification together with a multi-class classification problem. |\n" @@ -115,7 +115,7 @@ "\n", "But it's more than enough to get started.\n", "\n", - "We're going to gets hands-on with this setup throughout this notebook." + "We're going to get hands-on with this setup throughout this notebook." ] }, { @@ -345,7 +345,7 @@ "\n", "This tells us that our problem is **binary classification** since there's only two options (0 or 1).\n", "\n", - "How many values of each class is there?" + "How many values of each class are there?" ] }, { @@ -447,11 +447,11 @@ "\n", "One of the most common errors in deep learning is shape errors.\n", "\n", - "Mismatching the shapes of tensors and tensor operations with result in errors in your models.\n", + "Mismatching the shapes of tensors and tensor operations will result in errors in your models.\n", "\n", "We're going to see plenty of these throughout the course.\n", "\n", - "And there's no surefire way to making sure they won't happen, they will.\n", + "And there's no surefire way to make sure they won't happen, they will.\n", "\n", "What you can do instead is continually familiarize yourself with the shape of the data you're working with.\n", "\n", @@ -723,7 +723,7 @@ "\n", "How about we create a model?\n", "\n", - "We'll want a model capable of handling our `X` data as inputs and producing something in the shape of our `y` data as ouputs.\n", + "We'll want a model capable of handling our `X` data as inputs and producing something in the shape of our `y` data as outputs.\n", "\n", "In other words, given `X` (features) we want our model to predict `y` (label).\n", "\n", @@ -819,7 +819,7 @@ "That's why `self.layer_2` has `in_features=5`, it takes the `out_features=5` from `self.layer_1` and performs a linear computation on them, turning them into `out_features=1` (the same shape as `y`).\n", "\n", "![A visual example of what a classification neural network with linear activation looks like on the tensorflow playground](https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/images/02-tensorflow-playground-linear-activation.png)\n", - "*A visual example of what a similar classificiation neural network to the one we've just built looks like. Try create one of your own on the [TensorFlow Playground website](https://playground.tensorflow.org/).*\n", + "*A visual example of what a similar classification neural network to the one we've just built looks like. Try creating one of your own on the [TensorFlow Playground website](https://playground.tensorflow.org/).*\n", "\n", "You can also do the same as above using [`nn.Sequential`](https://pytorch.org/docs/stable/generated/torch.nn.Sequential.html).\n", "\n", @@ -871,7 +871,7 @@ "\n", "`nn.Sequential` is fantastic for straight-forward computations, however, as the namespace says, it *always* runs in sequential order.\n", "\n", - "So if you'd something else to happen (rather than just straight-forward sequential computation) you'll want to define your own custom `nn.Module` subclass.\n", + "So if you'd like something else to happen (rather than just straight-forward sequential computation) you'll want to define your own custom `nn.Module` subclass.\n", "\n", "Now we've got a model, let's see what happens when we pass some data through it." ] @@ -926,7 +926,7 @@ "id": "q7v8TVnqGMZh" }, "source": [ - "Hmm, it seems there's the same amount of predictions as there is test labels but the predictions don't look like they're in the same form or shape as the test labels.\n", + "Hmm, it seems there are the same amount of predictions as there are test labels but the predictions don't look like they're in the same form or shape as the test labels.\n", "\n", "We've got a couple steps we can do to fix this, we'll see these later on." ] @@ -943,7 +943,7 @@ "\n", "But different problem types require different loss functions. \n", "\n", - "For example, for a regression problem (predicting a number) you might used mean absolute error (MAE) loss.\n", + "For example, for a regression problem (predicting a number) you might use mean absolute error (MAE) loss.\n", "\n", "And for a binary classification problem (like ours), you'll often use [binary cross entropy](https://towardsdatascience.com/understanding-binary-cross-entropy-log-loss-a-visual-explanation-a3ac6025181a) as the loss function.\n", "\n", @@ -956,11 +956,11 @@ "| Stochastic Gradient Descent (SGD) optimizer | Classification, regression, many others. | [`torch.optim.SGD()`](https://pytorch.org/docs/stable/generated/torch.optim.SGD.html) |\n", "| Adam Optimizer | Classification, regression, many others. | [`torch.optim.Adam()`](https://pytorch.org/docs/stable/generated/torch.optim.Adam.html) |\n", "| Binary cross entropy loss | Binary classification | [`torch.nn.BCELossWithLogits`](https://pytorch.org/docs/stable/generated/torch.nn.BCEWithLogitsLoss.html) or [`torch.nn.BCELoss`](https://pytorch.org/docs/stable/generated/torch.nn.BCELoss.html) |\n", - "| Cross entropy loss | Mutli-class classification | [`torch.nn.CrossEntropyLoss`](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html) |\n", + "| Cross entropy loss | Multi-class classification | [`torch.nn.CrossEntropyLoss`](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html) |\n", "| Mean absolute error (MAE) or L1 Loss | Regression | [`torch.nn.L1Loss`](https://pytorch.org/docs/stable/generated/torch.nn.L1Loss.html) | \n", "| Mean squared error (MSE) or L2 Loss | Regression | [`torch.nn.MSELoss`](https://pytorch.org/docs/stable/generated/torch.nn.MSELoss.html#torch.nn.MSELoss) | \n", "\n", - "*Table of various loss functions and optimizers, there are more but these some common ones you'll see.*\n", + "*Table of various loss functions and optimizers, there are more but these are some common ones you'll see.*\n", "\n", "Since we're working with a binary classification problem, let's use a binary cross entropy loss function.\n", "\n", @@ -1222,7 +1222,7 @@ "* If `y_pred_probs` >= 0.5, `y=1` (class 1)\n", "* If `y_pred_probs` < 0.5, `y=0` (class 0)\n", "\n", - "To turn our prediction probabilities in prediction labels, we can round the outputs of the sigmoid activation function." + "To turn our prediction probabilities into prediction labels, we can round the outputs of the sigmoid activation function." ] }, { @@ -1309,9 +1309,9 @@ "id": "NXqUulG3maPH" }, "source": [ - "This means we'll be able to compare our models predictions to the test labels to see how well it's going. \n", + "This means we'll be able to compare our model's predictions to the test labels to see how well it's performing. \n", "\n", - "To recap, we converted our model's raw outputs (logits) to predicition probabilities using a sigmoid activation function.\n", + "To recap, we converted our model's raw outputs (logits) to prediction probabilities using a sigmoid activation function.\n", "\n", "And then converted the prediction probabilities to prediction labels by rounding them.\n", "\n", @@ -1566,8 +1566,8 @@ "\n", "| Model improvement technique* | What does it do? |\n", "| ----- | ----- |\n", - "| **Add more layers** | Each layer *potentially* increases the learning capabilities of the model with each layer being able to learn some kind of new pattern in the data, more layers is often referred to as making your neural network *deeper*. |\n", - "| **Add more hidden units** | Similar to the above, more hidden units per layer means a *potential* increase in learning capabilities of the model, more hidden units is often referred to as making your neural network *wider*. |\n", + "| **Add more layers** | Each layer *potentially* increases the learning capabilities of the model with each layer being able to learn some kind of new pattern in the data. More layers are often referred to as making your neural network *deeper*. |\n", + "| **Add more hidden units** | Similar to the above, more hidden units per layer means a *potential* increase in learning capabilities of the model. More hidden units are often referred to as making your neural network *wider*. |\n", "| **Fitting for longer (more epochs)** | Your model might learn more if it had more opportunities to look at the data. |\n", "| **Changing the activation functions** | Some data just can't be fit with only straight lines (like what we've seen), using non-linear activation functions can help with this (hint, hint). |\n", "| **Change the learning rate** | Less model specific, but still related, the learning rate of the optimizer decides how much a model should change its parameters each step, too much and the model overcorrects, too little and it doesn't learn enough. |\n", @@ -1704,7 +1704,7 @@ " ### Training\n", " # 1. Forward pass\n", " y_logits = model_1(X_train).squeeze()\n", - " y_pred = torch.round(torch.sigmoid(y_logits)) # logits -> predicition probabilities -> prediction labels\n", + " y_pred = torch.round(torch.sigmoid(y_logits)) # logits -> prediction probabilities -> prediction labels\n", "\n", " # 2. Calculate loss/accuracy\n", " loss = loss_fn(y_logits, y_train)\n", @@ -2170,7 +2170,7 @@ "\n", "> **Note:** A helpful troubleshooting step when building deep learning models is to start as small as possible to see if the model works before scaling it up. \n", ">\n", - "> This could mean starting with a simple neural network (not many layers, not many hidden neurons) and a small dataset (like the one we've made) and then **overfitting** (making the model perform too well) on that small example before increasing the amount data or the model size/design to *reduce* overfitting.\n", + "> This could mean starting with a simple neural network (not many layers, not many hidden neurons) and a small dataset (like the one we've made) and then **overfitting** (making the model perform too well) on that small example before increasing the amount of data or the model size/design to *reduce* overfitting.\n", "\n", "So what could it be?\n", "\n", @@ -2322,7 +2322,7 @@ "\n", "Well let's see.\n", "\n", - "PyTorch has a bunch of [ready-made non-linear activation functions](https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity) that do similiar but different things. \n", + "PyTorch has a bunch of [ready-made non-linear activation functions](https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity) that do similar but different things. \n", "\n", "One of the most common and best performing is [ReLU](https://en.wikipedia.org/wiki/Rectifier_(neural_networks)) (rectified linear-unit, [`torch.nn.ReLU()`](https://pytorch.org/docs/stable/generated/torch.nn.ReLU.html)).\n", "\n", @@ -2382,11 +2382,11 @@ }, "source": [ "![a classification neural network on TensorFlow playground with ReLU activation](https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/images/02-tensorflow-playground-relu-activation.png)\n", - "*A visual example of what a similar classificiation neural network to the one we've just built (using ReLU activation) looks like. Try create one of your own on the [TensorFlow Playground website](https://playground.tensorflow.org/).*\n", + "*A visual example of what a similar classification neural network to the one we've just built (using ReLU activation) looks like. Try creating one of your own on the [TensorFlow Playground website](https://playground.tensorflow.org/).*\n", "\n", "> **Question:** *Where should I put the non-linear activation functions when constructing a neural network?*\n", ">\n", - "> A rule of thumb is to put them in between hidden layers and just after the output layer, however, there is no set in stone option. As you learn more about neural networks and deep learning you'll find a bunch of different ways of putting things together. In the meantine, best to experiment, experiment, experiment.\n", + "> A rule of thumb is to put them in between hidden layers and just after the output layer, however, there is no set in stone option. As you learn more about neural networks and deep learning you'll find a bunch of different ways of putting things together. In the meantime, best to experiment, experiment, experiment.\n", "\n", "Now we've got a model ready to go, let's create a binary classification loss function as well as an optimizer." ] @@ -2913,7 +2913,7 @@ "id": "f5Ephtx6f1jB" }, "source": [ - "### 8.1 Creating mutli-class classification data\n", + "### 8.1 Creating multi-class classification data\n", "\n", "To begin a multi-class classification problem, let's create some multi-class data.\n", "\n", @@ -3027,7 +3027,7 @@ "\n", "You might also be starting to get an idea of how flexible neural networks are.\n", "\n", - "How about we build one similar to `model_3` but this still capable of handling multi-class data?\n", + "How about we build one similar to `model_3` but this is still capable of handling multi-class data?\n", "\n", "To do so, let's create a subclass of `nn.Module` that takes in three hyperparameters:\n", "* `input_features` - the number of `X` features coming into the model.\n", @@ -3354,7 +3354,7 @@ "id": "yhwu9ln1sbl7" }, "source": [ - "These prediction probablities are essentially saying how much the model *thinks* the target `X` sample (the input) maps to each class.\n", + "These prediction probabilities are essentially saying how much the model *thinks* the target `X` sample (the input) maps to each class.\n", "\n", "Since there's one value for each class in `y_pred_probs`, the index of the *highest* value is the class the model thinks the specific data sample *most* belongs to.\n", "\n", @@ -3507,7 +3507,7 @@ "source": [ "### 8.6 Making and evaluating predictions with a PyTorch multi-class model\n", "\n", - "It looks like our trained model is performaning pretty well.\n", + "It looks like our trained model is performing pretty well.\n", "\n", "But to make sure of this, let's make some predictions and visualize them." ] @@ -3776,7 +3776,7 @@ "* Write down 3 problems where you think machine classification could be useful (these can be anything, get creative as you like, for example, classifying credit card transactions as fraud or not fraud based on the purchase amount and purchase location features). \n", "* Research the concept of \"momentum\" in gradient-based optimizers (like SGD or Adam), what does it mean?\n", "* Spend 10-minutes reading the [Wikipedia page for different activation functions](https://en.wikipedia.org/wiki/Activation_function#Table_of_activation_functions), how many of these can you line up with [PyTorch's activation functions](https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity)?\n", - "* Research when accuracy might be a poor metric to use (hint: read [\"Beyond Accuracy\" by by Will Koehrsen](https://willkoehrsen.github.io/statistics/learning/beyond-accuracy-precision-and-recall/) for ideas).\n", + "* Research when accuracy might be a poor metric to use (hint: read [\"Beyond Accuracy\" by Will Koehrsen](https://willkoehrsen.github.io/statistics/learning/beyond-accuracy-precision-and-recall/) for ideas).\n", "* **Watch:** For an idea of what's happening within our neural networks and what they're doing to learn, watch [MIT's Introduction to Deep Learning video](https://youtu.be/7sB052Pz0sQ)." ] }