Download datasets into Google Drive using Google Colab

Before learning how to download datasets into Google Drive using Google Colab lets see what Google Colaboratory is.

Google Colaboratory:

Google Colaboratory is commonly known as Google Colab. This Google Colab is an online environment provided by Google, especially for AI and  Machine learning enthusiasts. This environment provides users with jupyter notebooks with extra RAM and disk space.
Google Colaboratory disk space
To access Google Colab one should attach it with his or her Gmail. The python notebooks that were created or accessed by the user will be stored in his or her Google Drive.

Before downloading a dataset into Drive using Google Colab one must mount his or her Google Drive to Google Colab because colab stores its files in  Google Drive.

Mounting Google Drive to Google Colab:

To download datasets into the drive we have to establish a connection between the drive and the colab file.

This can be done using a module named drive that is provided by Google Colab.

from google.colab import drive
drive.mount('/content/gdrive')

The cell will return the following.

colab mounting
Go to the link that you will encounter after running the code to retrieve the authorization code.

With this mounting process is over. Now its time to change the root path of your Drive to enter the specific folder that you want to access (i.e. the folder where you have stored your project).

root_path="gdrive/My Drive/your_project_folder/"

Now that mounting is completed the user can download the required dataset into drive.

 Downloading dataset into drive:

Downloading a dataset involves 5 steps, they are:

  1. Get the API key from your account.

  2. Upload the JSON file.

  3. Create the necessary folder path. (optional)
  4. Download the required dataset.

  5. Unzip it.

Step 1: Get the API key from your account:

Visit Kaggle, login to your account, go to My Account, and then Create New API Token.

I have used Kaggle because this is one of the most popular websites for datasets

After completing the above process a file with the name “kaggle.json” will be automatically downloaded.

Step 2: Upload the JSON file:

Colab provides a module named files for the purpose of uploading a file.

from google.colab import files
files.upload()

Step 3: Create the necessary folder path:

This step is optional.

!pip install -q kaggle
!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle/
!ls ~/.kaggle
!chmod 600 /root/.kaggle/kaggle.json

Commands to understand:

  1. ‘!’ is used to say that the modules that are being downloaded can only be accessed only in that file.
  2. pip install is a command that installs modules in python(through command prompt).
  3. mkdir creates a directory.
  4. chmod 600 sets the permissions so that the user/owner can read and write on the file but cannot execute it.

Step 4: Download the required dataset:

!kaggle competitions download -c 'name_of_competition' -p "target_colab_dir"

Step 5: Unzip it:

!unzip -q file[.zip] -d [exdir]

Syntax to understand:

  1. ‘q’  suppresses the name of the file.
  2. ‘d [exdir]’ directory to which to extract files(optional).

Read more here: Source link