‘Deep Learning with Pytorch’ Notes

Chapter 1

Chapter 2

Terms to know:

Label / Class: The tag that a given image falls into
- Ex. if you’re given an image of a dog, the label is ‘dog’

Types of deep learning tasks on images

Image classification
- Classifying objects within an image
Object localization
- Identifying objects positions within an image
Object detection
- Identifying and labeling objects within an image
Scene classification
- Classifying the given situation within an image
Scene parsing
- Dividing an image into regions which relate to the contents of the image

Steps to classify an image:

Starting image, ex. Dog
Image gets resized and normalized
- Preprocessed into a torch.Tensor (vector)
Forward passed into model, outputs a vector of scores
Scores map one-to-one to label
- Each score represents the score associated with a given class

Getting a pre-trained model for image recognition

Some good models include:
- AlexNet, ResNet, and Inception v3

Models can be found within torchvision.models, theres a bunch of them

from torchvision import models
list_models = dir(models)
print("Num of models: ", len(list_models))
print(list_models[0:5])
print(list_models[170:175])

NOTE: Notice how theres both uppercase / lowercase models?
- Uppercase represent classes that implement varying amount of models
- Lowercase represent convince functions that return an instance of that model type
  - ex. ‘resnet50’ returns an instance of model ‘resnet’ with 50 layers

Randomized vs pretrained models

Initializing the network is as easy as:

alexnet = models.AlexNet()

However, this creates a randomized AlexNet object, with no pretrained weights

We can use one of those handy-dandy lowercased models, which instantiates a model for us with pretrained models.For example, creating instance of resnet with 101 layers:

resnet = models.resnet101(pretrained=True)

Preprocessing images via transforms

Prior to using any neural network, we have to process and normalize the input, so the network can work When processing images, we can preprocess them with torchvision’s transforms module. Here is an example:

from torchvision import transforms

preprocess = transforms.Compose([
    transforms.Resize(256),          # resizes to 256 x 256
    transforms.CenterCrop(224),      # crops 224 x 224 around center of new image
    transforms.ToTensor(),           # converts image to tensor
    transforms.Normalize(            # normalizes color channels (RGB)
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]
    )
])

NOTE: transforms.Compose() returns a tensor with those given parameters

Feeding an image into the preprocessing segment

Lets say you have a dog image located at PATH=”~/dog.png”. To feed it into the neural network, we would have to:

Preprocess it -> img_tensor = preprocess(img)
Create an inference of the neural network we’re using
- An inference is just putting the NN in an evaluate state where it can produce actual results
- Done so by doing resnet.eval()
Feeding our img_tensor into the NN
- output = resnet(img_tensor)

Cross referencing the model output with ImageNet Data Labels

Read lines from “imagenet_classes.txt”
Get the maximum score from output via _, index = torch.max(output, 1)
Get the models confidence level for the prediction
```
percentage = torch.nn.functional.softmax(out, dim=1)[0] * 100

labels[index[0], percentage[index[0]]].item()
    
```
Should output: (‘golden retriever’, 96.293…) Therefore the model predicted with 96.293% certainty that the dog image was a golden retriever

Listing out the labels of the other predictions

_, indices = torch.sort(out, descending=True)
other_predictions_and_percentages = [(labels[idx], percentage[idx].item()) for idx in indices[0][:5]]

GAN - Generative Adversarial Networks

GAN Diagram Two neural networks that compete against each other, where one produces an output, while the other interprets and identifies the mistakes in the output

Terms to know

Generator: produces output from an arbitrary input
- Its end goal is fool the discriminator with its image outputs
Discriminator: analyzes whether or not an input image was fabricated or belongs to a real set of images
- Its end goal is to know when its being fooled

CycleGAN

CycleGAN Diagram Note that the D’s in the image are discriminators

Can turn images from one domain into images from another (and back)
- Ex. turning an image into a Zebra, form a picture of a horse

NLP model that describes scenes

Natural language model for describing scenes within an image

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!