My Problem
I’m struggling with the different definitions of batch size, sequence, sequence length and batch length of a RNN and how to use it in the correct way.
First things first – let’s clarify the definitions
Consider the following data with two features and two labels.
timestamp | feature 1 | feature 2 | label 1 | label 2 |
---|---|---|---|---|
t1 | f1.1 | f2.1 | l1.1 | l2.1 |
t2 | f1.2 | f2.2 | l1.2 | l2.2 |
t3 | f1.3 | f2.3 | l1.3 | l2.3 |
t4 | f1.4 | f2.4 | l1.4 | l2.4 |
t5 | f1.5 | f2.5 | l1.5 | l2.5 |
t6 | f1.6 | f2.6 | l1.6 | l2.6 |
… | … | … | … | … |
Let us assume the system which I want to train in the RNN process always the current and the two previous time stamps. The following examples of definitions refer to this framework.
Definition Training Example:
A training is setup of trainings data which is processed by the neural network at once.
Example: [f1.1, f2.1]
Definition Sequence:
A Sequence is a setup of several trainings data which are processed in a row to the network.
Example:
[[f1.1, f2.1],
[f1.2, f2.2],
[f1.3, f2.3]]
Definition Sequence number:
The number of training examples which needed to be processed as sequence by the RNN is called sequence number.
Example: 3
Definition Batch Size:
The batch size is the number of sequences which are forward to the RNN before the gradients are calculated.
Example:
[[[f1.1, f2.1],
[f1.2, f2.2],
[f1.3, f2.3]],
[[f1.2, f2.2],
[f1.3, f2.3],
[f1.4, f2.4]],
[[f1.3, f2.3],
[f1.4, f2.4],
[f1.5, f2.5]]
[[f1.4, f2.4],
[f1.5, f2.5],
[f1.6, f2.6]]
]
Definition Batch Length:
The total number of batches are the batch length.
Example: 1
in the previous example.
Definition Data Length:
The total number training examples is calculated by the batch length times batch size times sequence number.
Example: 3 * 4 * 1
of the previous examples.
Second – implementation with pyTorch
For the implementation of my RNN I use pyTorch with the following code as an example. However, if my previous definition are right, I’m unable to transfer them to the code. I have always errors with the tensor dimensions.
class RNN(nn.Module):
def __init__(self, input_size, hidden_size, num_layers, output_size, sequence_length):
super(RNN, self).__init__()
self.hidden_size = hidden_size
self.num_layers = num_layers
self.batch_size = sequence_length
self.output_size = output_size
self.rnn = nn.RNN(input_size, hidden_size, num_layers, batch_first=True)
self.fc = nn.Linear(hidden_size * sequence_length, output_size)
def forward(self, x):
hidden_state = torch.zeros(x.size(0), self.num_layers, self.hidden_size).to(device)
out, _ = self.rnn(x, hidden_state)
out = out.reshape(out.shape[0], -1)
out = self.fc(out)
return out
Questions
- Are the definitions correct?
- How should the hidden_state initialized correctly by considering the batch size, batch number, sequence, sequence number, hidden size and hidden layer size?
- What shape should be x in the forward method, assuming x represents the complete or parts of the previous data example.
Please help me to solve the puzzle, in the best case with an example for x and the hidden_state based on my example.
Many thanks.
Read more here: Source link