python – Pytorch convolutional Autoencoder

In the encoder, you’re repeating:

nn.Conv2d(128, 256, kernel_size=5, stride=1),
nn.ReLU(),
nn.Conv2d(128, 256, kernel_size=5, stride=1),
nn.ReLU()

Just delete the duplication, and shapes will fit.

Note: As output of your encoder you’ll have a shape of batch_size * 256 * h' * w'. 256 is the number of channels as output of the last convolution in the encoder, and h', w' will depend on the size of the input image h, w after passing through convolutional layers.

You’re using nb_channels, and embedding_dim nowhere. And I can’t see what you mean by embedding_dim since you’re only using convolutions and no connecter layers.

===========EDIT===========

after dialog in down comments, I’ll let this code here to inspire you -I hope- (and tell me if it works)

from torch import nn
import torch
import torch
from torch.utils.data import Dataset
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor

data = datasets.MNIST(root="data", train=True, download=True, transform=ToTensor())

class AutoEncoderCNN(nn.Module):
  def __init__(self):
    super(AutoEncoderCNN, self).__init__()
    self.encoder = nn.Sequential(
        nn.Conv2d(1, 32, kernel_size=5, stride=1),
        nn.ReLU(),
        nn.Conv2d(32, 64, kernel_size=5, stride=1),
        nn.ReLU(),
        nn.Conv2d(64, 128, kernel_size=5, stride=1),
        nn.ReLU(),
        nn.Conv2d(128, 256, kernel_size=5, stride=1),
        nn.ReLU(),
    )
    self.decoder = nn.Sequential(
        nn.ConvTranspose2d(256, 128, kernel_size=5, stride=1),
        nn.ReLU(),
        nn.ConvTranspose2d(128, 64, kernel_size=5, stride=1),
        nn.ReLU(),
        nn.ConvTranspose2d(64, 32, kernel_size=5, stride=1),
        nn.ReLU(),
        nn.ConvTranspose2d(32, 1, kernel_size=5, stride=1),
        nn.Sigmoid()      
    )
          
  def forward(self, x):
      x = self.encoder(x)
      x = self.decoder(x)
      return x
  
model = AutoEncoderCNN()
mnistTrainLoader = DataLoader(data,
                              batch_size=32, shuffle=True, num_workers=0)

loss_function = nn.MSELoss(size_average=None, reduce=None, reduction='mean')
optimizer =  torch.optim.Adam(model.parameters(), lr=1e-3)
losses = []
i = 0
running_loss = .0
for epoch in range(100):
  for features, _ in mnistTrainLoader:
    y = model(features)
    loss = loss_function(y, features)
    losses.append(loss.item())
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    running_loss += loss.item()
    if i % 10 == 9:    
        print('[Epoque : %d, iteration: %5d] loss: %.3f'%
              (epoch + 1, i + 1, running_loss / 10))
        running_loss = 0.0
    i+=1   

=======Adding a channel dimension=======

The problem was actually while creating the dataset, since the dataset contains greyscale images, the PyTorch MNIST dataset helper is returning the image without the dimension of channels. Convolutions need this dimension, so we need to add it.

Instead of loading dataset this way:

X_train = torchvision.datasets.MNIST(root="./data", train=True, download=True, transform=transforms.ToTensor()).data
print(X_train.shape) # torch.Size([60000, 28, 28])

We load it this way:

X_train = torchvision.datasets.MNIST(root="./data", train=True, download=True).data[:,None,:,:]/255.
# /255. to have floats between 0 and 1 instead of unsigned int
print(X_train.shape) # torch.Size([60000, 1, 28, 28])

Another way to handle this problem is in the model class, by adding the channel dimension to the input x.

Read more here: Source link