pytorch – 1D Sequence Classification with self-supervised learning

I am working on a multi-class classification task on long one-dimensional sequences. The sequence length may vary in the range $[512, 30720]$, and there is one feature associated each time-step in the range. This means that the input to the model is of the shape $(N, 1, L)$ where $N$ and $L$ are the variables for the batch size and sequence length respectively. The singleton feature dimension contains values in the range $[0, 40]$. A small slice of a sequence might look like this: $[0,3,2,1,2,7,3]$.

The standard way of using deep learning (i.e. supervised learning) to solve this task is by choosing some neural network architecture (our model) with an appropriate inductive bias towards the data and encode the features, propagate it threw the neural network, optional pooling, decode the features and finally compute the loss w.r.t the ground truth in that order. In PyTorch this process would look something like this (the model is a basic ResNet block):

import torch

input_shape = (5, 1, 1024) # batch size, features, timesteps
X = torch.randint(0, 14, input_shape).type(torch.float) # sample input
y_true = torch.tensor([0, 0, 1, 2, 1]) # sample ground-truth (3 classes)

d_model = 64 # 64 is our hidden dimension
encoder = torch.nn.Conv1d(in_channels=1, out_channels=d_model, kernel_size=1)

class Residual(torch.nn.Module):
  def __init__(self, m):
    super().__init__()
    self.m = m

  def forward(self, x):
    return self.m(x) + x

model = torch.nn.Sequential(
  Residual(
    torch.nn.Sequential(
      torch.nn.Conv1d(d_model, d_model, 3, padding='same'),
      torch.nn.BatchNorm1d(d_model),
      torch.nn.ReLU(),
      torch.nn.Conv1d(d_model, d_model, 3, padding='same'),
      torch.nn.BatchNorm1d(d_model),
    ),
  ),
  torch.nn.ReLU(),
)

pooler = lambda x: x.mean(2) # average pooling over the length dimension
decoder = torch.nn.Linear(d_model, 3) # 3 output logits for each class

criterion = torch.nn.CrossEntropyLoss()

# training pipeline
X = encoder(X) # (N, 1, L) -> (N, H, L)
X = model(X) # (N, H, L) -> (N, H, L)
X = pooler(X) # (N, H, L) -> (N, H)
y_pred = decoder(X) # (N, H) -> (N, 3)
loss = criterion(y_pred, y_true)
loss.backward()

Assuming that the neural network architecture model is appropriate, I have considered using self-supervised learning (SSL) to boost the final classification accuracy on the withheld test set. I have seen the method gain attention in language modelling tasks with Transformers (i.e. BERT), but how does one apply SSL in the general case outside the language task domain? Is this approach successful without using Transformers?

As I understand, SSL is used on unlabeled data to learn the underlying structure of the data and is followed by a fine-tuning stage – i.e. supervised learning with the pretrained weights. Two SSL tasks that I can think of are 1) masking out certain timesteps and letting the model predict the values of the missing timesteps, and 2) training the model to predict the value of the next timestep given the previous timesteps. I’ll use the following image to illustrate:
SSL

Intuitively, I think this should work but I am not sure how to implement it in a practical setting with PyTorch. For instance, when predicting the next timestep (red bar) can I treat it as a classification task with (40) classes? Would it work for varying sequence lengths? How big proportion of the training data should be used for pretraining?

Read more here: Source link