All the pre-trained models provided in the torchvision package in PyTorch are trained on the ImageNet dataset and can be used out of the box on this dataset. But often times you want to use these models on other available image datasets or even your own custom dataset. This usually requires modifying and fine-tuning the model to work with the new dataset. Changing the output dimension of the last layer in the model is usually among the first changes you need to make, and that’s the focus of this post.

Let’s start with loading a pre-trained model from the torchvision package. We use the VGG16 model, pretrained on the ImageNet dataset with 1000 object categories. Let’s take a look at the modules on this model:

import torch
import torch.nn as nn
import torchvision.models as models

vgg16 = models.vgg16(pretrained=True)
print(vgg16._modules.keys())
odict_keys(['features', 'avgpool', 'classifier'])

We are only interested in the last layer, so let’s print the layers in the ‘classifier’ module:

print(vgg16._modules['classifier'])
Sequential(
  (0): Linear(in_features=25088, out_features=4096, bias=True)
  (1): ReLU(inplace=True)
  (2): Dropout(p=0.5, inplace=False)
  (3): Linear(in_features=4096, out_features=4096, bias=True)
  (4): ReLU(inplace=True)
  (5): Dropout(p=0.5, inplace=False)
  (6): Linear(in_features=4096, out_features=1000, bias=True)
)

As expected, the output dimension for the last layer is 1000. Let’s assume we are going to use this model on the COCO dataset with 80 object categories. To change the output dimension of the model to 80, we simply replace the last sub-layer with a new Linear layer. The Linear layer takes two required arguments: in_features and out_features. The in_features is going to be the same as before, and out_features is goint to be 80:

in_features = vgg16._modules['classifier'][-1].in_features
out_features = 80
vgg16._modules['classifier'][-1] = nn.Linear(in_features, out_features, bias=True)
print(vgg16._modules['classifier'])
Sequential(
  (0): Linear(in_features=25088, out_features=4096, bias=True)
  (1): ReLU(inplace=True)
  (2): Dropout(p=0.5, inplace=False)
  (3): Linear(in_features=4096, out_features=4096, bias=True)
  (4): ReLU(inplace=True)
  (5): Dropout(p=0.5, inplace=False)
  (6): Linear(in_features=4096, out_features=80, bias=True)
)

That’s it! The output dimension is now 80. You need to keep in mind that by replacing the last layer we removed any learned parameter in this layer. You need to finetune the model on the new dataset at this point to learn the parameters again.