I’m struggling with the different definitions of batch size, sequence, sequence length and batch length of a RNN and how to use it in the correct way.
First things first – let’s clarify the definitions
Consider the following data with two features and two labels.
|timestamp||feature 1||feature 2||label 1||label 2|
Let us assume the system which I want to train in the RNN process always the current and the two previous time stamps. The following examples of definitions refer to this framework.
Definition Training Example:
A training is setup of trainings data which is processed by the neural network at once.
A Sequence is a setup of several trainings data which are processed in a row to the network.
[[f1.1, f2.1], [f1.2, f2.2], [f1.3, f2.3]]
Definition Sequence number:
The number of training examples which needed to be processed as sequence by the RNN is called sequence number.
Definition Batch Size:
The batch size is the number of sequences which are forward to the RNN before the gradients are calculated.
[[[f1.1, f2.1], [f1.2, f2.2], [f1.3, f2.3]], [[f1.2, f2.2], [f1.3, f2.3], [f1.4, f2.4]], [[f1.3, f2.3], [f1.4, f2.4], [f1.5, f2.5]] [[f1.4, f2.4], [f1.5, f2.5], [f1.6, f2.6]] ]
Definition Batch Length:
The total number of batches are the batch length.
1 in the previous example.
Definition Data Length:
The total number training examples is calculated by the batch length times batch size times sequence number.
3 * 4 * 1 of the previous examples.
Second – implementation with pyTorch
For the implementation of my RNN I use pyTorch with the following code as an example. However, if my previous definition are right, I’m unable to transfer them to the code. I have always errors with the tensor dimensions.
class RNN(nn.Module): def __init__(self, input_size, hidden_size, num_layers, output_size, sequence_length): super(RNN, self).__init__() self.hidden_size = hidden_size self.num_layers = num_layers self.batch_size = sequence_length self.output_size = output_size self.rnn = nn.RNN(input_size, hidden_size, num_layers, batch_first=True) self.fc = nn.Linear(hidden_size * sequence_length, output_size) def forward(self, x): hidden_state = torch.zeros(x.size(0), self.num_layers, self.hidden_size).to(device) out, _ = self.rnn(x, hidden_state) out = out.reshape(out.shape, -1) out = self.fc(out) return out
- Are the definitions correct?
- How should the hidden_state initialized correctly by considering the batch size, batch number, sequence, sequence number, hidden size and hidden layer size?
- What shape should be x in the forward method, assuming x represents the complete or parts of the previous data example.
Please help me to solve the puzzle, in the best case with an example for x and the hidden_state based on my example.
Read more here: Source link