A simple, unofficial implementation of MAE using pytorch-lightning

A simple, unofficial implementation of MAE (Masked Autoencoders are Scalable Vision Learners) using  pytorch-lightning.

Currently implements training on CUB and StanfordCars, but is easily extensible to any other image dataset.

Setup

# Clone the repository
git clone https://github.com/catalys1/mae-pytorch.git
cd mae-pytorch

# Install required libraries (inside a virtual environment preferably)
pip install -r requirements.txt

# Set up .env for path to data
echo "DATADIR=/path/to/data" > .env

Usage

MAE training

Training options are provided through configuration files, handled by LightningCLI. See configs/ for examples.

Train an MAE model on the CUB dataset:

python train.py fit --config=configs/mae.yaml --config=configs/data/cub_mae.yaml

Using multiple GPUs:

python train.py fit --config=configs/mae.yaml --config=configs/data/cub_mae.yaml --config=configs/multigpu.yaml

Fine-tuning

Not yet implemented.

Implementation

The default model uses ViT-Base for the encoder, and a small ViT (depth=4, width=192) for the decoder. This is smaller than the model used in the paper.

Dependencies

  • Configuration and training is handled completely by pytorch-lightning.
  • The MAE model uses the VisionTransformer from timm.
  • Interface to FGVC datasets through fgvcdata.
  • Configurable environment variables through python-dotenv.

Results

Image reconstructions of CUB validation set images after training with the following command:

python train.py fit --config=configs/mae.yaml --config=configs/data/cub_mae.yaml --config=configs/multigpu.yaml
Bird Reconstructions

Read more here: Source link