Step-by-step procedures to build the Image Classification model on Kaggle | by Rahul Gupta

  • Select dataset of your choice and upload on Kaggle
  • Apply augmentation to the original dataset
  • Visualize the augmented dataset
  • Configure GPU
  • Build the model and start training
  • Analyse the model’s accuracy and loss

The motivation behind this story is to encourage readers to start working on the Kaggle platform. A few weeks ago, I faced many challenges on Kaggle related to data upload, apply augmentation, configure GPU for training, etc. This inspires me to build an image classification model to mitigate those challenges. It is not feasible to discuss every block of code in this story. Therefore, at the end of the tutorial, you will find the link to the notebook hosted on jovian.ml. You can download/fork it for learning purposes.

Dataset

To start working on Kaggle there is a need to upload the dataset in the input directory. Below are the image snippets to do the same (follow the red marked shape).

Click on ‘Add data’ which opens up a new window to upload the dataset.

Kaggle directory Structure

We can upload a dataset from the local machine or datasets created earlier by ourselves.

There are many sources to collect data for image classification. I have chosen Images for Weather Recognition dataset from data.mendeley.com/datasets/4drtyfjtfy/1.

This dataset is a collection of 1,125 images divided into four categories such as cloudy, rain, shine, and sunrise. All are having different sizes which are helpful in dealing with real-life images. However, images in the dataset are very less which can make our model overfit. One way to increase the dataset is to use the data augmentation technique.

Augmentation

Once the dataset is uploaded. It can be seen in the Kaggle input directory structure. It consists of a train and a test folder, each having 4 classes in a different folder. Now it’s time to increase the dataset by adding augmented images. However, we cannot perform any write operation in the input directory as it is read-only. This is the problem I have faced when I was trying to add images in that directory. One possible way to avoid this is to use ‘/Kaggle/working/’ directory to perform augmentation. Thus, there is a need to create the same directory tree in ‘/Kaggle/working/’ directory. It is shown below.

Now to perform augmentation one can start with imguag. This python library helps in augmenting images for building machine learning projects. It converts a set of input images into a new, much larger set of slightly altered images. The augmentation sequence shown below offers various transformations like crop, additive Gaussian noise, horizontal flips, etc. The try-and-except blocks are also used to handle the exceptions related to dimensions mismatch and color-maps.

This block of code writes both augmented and original images in the Kaggle working directory.

Data Visualization

Now the next task after augmentation is to visualize the images before being used to train the model. It is important to see the variations in data and their similarities with real-life images. Below helper function does the job by displaying 64 images of all categories in a grid.

Use GPU

To enable the GPU on Kaggle, go to settings and set the accelerator as GPU.

As the sizes of our models and datasets increase, we need to use GPUs to train our models within a reasonable amount of time. GPUs contain hundreds of cores that are optimized for performing expensive matrix operations on floating-point numbers in a short time, which makes them ideal for training deep neural networks with many layers. We can use GPUs for free on Kaggle kernels (30 hrs/week).

We can check if a GPU is available and the required NVIDIA CUDA drivers are installed, using torch.cuda.is_available. To seamlessly use a GPU, there is a need for helper functions (get_default_device & to_device) and a helper class DeviceDataLoader to move our model & data to the GPU as required.

Build the model and start training

Now it’s time to build the model and implement the main class in Pytorch that contains methods to deal with the training and the validation.

Here, I have used a customized Resnet architecture to solve this classification problem. It consists of 3 residual networks that are embedded in between several Conv layers.

There are various regularization and optimization techniques/tricks that are used to scale down the training time. They give state-of-the-art results in a very quick time.

Data normalization: It normalized the image tensors by subtracting the mean and dividing by the standard deviation of pixels across each channel. It prevents the pixel values from any one channel from disproportionately affecting the losses and gradients. Know more

Residual connections: One of the key changes to the plain CNN model is the addition of the residual block, which adds the original input back to the output feature map obtained bypassing the input through one or more convolutional layers. Know more

Batch normalization: After each convolutional layer, a batch normalization layer is added to normalize the outputs of the previous layer. This is somewhat similar to data normalization, except it’s applied to the outputs of a layer, and the mean and standard deviation are learned parameters. Know more

Learning Rate Scheduling: Instead of using a fixed learning rate, I have used a learning rate scheduler, which will change the learning rate after every batch of training. There are many strategies for varying the learning rate during training, but I used the “One Cycle Learning Rate Policy”. Know more

Weight Decay: I have added weight decay to the optimizer, yet another regularization technique that prevents the weights from becoming too large by adding a new term to the loss function. Know more

Gradient clipping: I have also added gradient clipping, which helps limit the values of gradients to a small range to prevent undesirable changes in model parameters due to large gradient values during training. Know more

Adam optimizer: I have used Adam optimizer which uses techniques like momentum and adaptive learning rates for faster training. Know more

All the above-discussed tricks are used in our fit function to train the model.

Initially, it is trained for 8 epochs at a higher learning rate, then for the next 8 epochs at a lower learning rate. Finally, 91% accuracy is achieved in less than 9 minutes. That’s incredible!

Analyze the result

It’s time to analyze our trained model and see how accuracy and loss vary over epochs. Before that let’s see our learning rate scheduler and it’s variation over different iterations. Two cycles of LRS are used to reduce the loss. The maximum learning rate of the first one is set to 0.01 so that model can explore various plateaus and decide which one to choose to attend global minima. The Second cycle’s maximum learning rate is set 0.001 which is 1/10 times to the first one. It helps in getting close to global minima.

learning rate v/s batches

The impact of LRS can be seen in the accuracy of the validation set. In the first few epochs, accuracy decreases as the model tend to explore the different surfaces. But once it gets the right path, accuracy tends to increase every epoch.

Validation accuracy curve

Initially, there’s a huge difference between validation and training loss. After a few epochs, this difference is nullified as validation loss overlaps with training loss.

Training v/s Validation loss

You can explore more about this model on jovian.ml/rahulgupta291093/zero-to-gans-course-project.

It is recommended to use this notebook as a template to start building your own deep learning model. This can be done by setting different hyperparameters, CNN architectures on a different dataset.

Thanks a lot for reading!

Read more here: Source link