pytorch – Understanding batch size, sequence, sequence length and batch length of a RNN

My Problem

I’m struggling with the different definitions of batch size, sequence, sequence length and batch length of a RNN and how to use it in the correct way.

First things first – let’s clarify the definitions

Consider the following data with two features and two labels.

timestamp feature 1 feature 2 label 1 label 2
t1 f1.1 f2.1 l1.1 l2.1
t2 f1.2 f2.2 l1.2 l2.2
t3 f1.3 f2.3 l1.3 l2.3
t4 f1.4 f2.4 l1.4 l2.4
t5 f1.5 f2.5 l1.5 l2.5
t6 f1.6 f2.6 l1.6 l2.6

Let us assume the system which I want to train in the RNN process always the current and the two previous time stamps. The following examples of definitions refer to this framework.

Definition Training Example:
A training is setup of trainings data which is processed by the neural network at once.
Example: [f1.1, f2.1]

Definition Sequence:
A Sequence is a setup of several trainings data which are processed in a row to the network.

[[f1.1, f2.1], 
 [f1.2, f2.2],
 [f1.3, f2.3]]

Definition Sequence number:
The number of training examples which needed to be processed as sequence by the RNN is called sequence number.
Example: 3

Definition Batch Size:
The batch size is the number of sequences which are forward to the RNN before the gradients are calculated.

[[[f1.1, f2.1], 
  [f1.2, f2.2],
  [f1.3, f2.3]],
 [[f1.2, f2.2], 
  [f1.3, f2.3],
  [f1.4, f2.4]], 
 [[f1.3, f2.3], 
  [f1.4, f2.4],
  [f1.5, f2.5]]
 [[f1.4, f2.4], 
  [f1.5, f2.5],
  [f1.6, f2.6]]

Definition Batch Length:
The total number of batches are the batch length.
Example: 1 in the previous example.

Definition Data Length:
The total number training examples is calculated by the batch length times batch size times sequence number.
Example: 3 * 4 * 1 of the previous examples.

Second – implementation with pyTorch

For the implementation of my RNN I use pyTorch with the following code as an example. However, if my previous definition are right, I’m unable to transfer them to the code. I have always errors with the tensor dimensions.

class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, output_size, sequence_length):
        super(RNN, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers  = num_layers
        self.batch_size  = sequence_length
        self.output_size = output_size
        self.rnn = nn.RNN(input_size, hidden_size, num_layers, batch_first=True)
        self.fc = nn.Linear(hidden_size * sequence_length, output_size)

    def forward(self, x):
        hidden_state = torch.zeros(x.size(0), self.num_layers, self.hidden_size).to(device)
        out, _ = self.rnn(x, hidden_state)
        out = out.reshape(out.shape[0], -1)
        out = self.fc(out)
        return out


  1. Are the definitions correct?
  2. How should the hidden_state initialized correctly by considering the batch size, batch number, sequence, sequence number, hidden size and hidden layer size?
  3. What shape should be x in the forward method, assuming x represents the complete or parts of the previous data example.

Please help me to solve the puzzle, in the best case with an example for x and the hidden_state based on my example.

Many thanks.

Read more here: Source link