Initialize Hidden and Cell States
All LSTM layers start with zero hidden states: h1[0], h2[0], h3[0], h4[0] = 0
All cell states start as zeros: C1[0], C2[0], C3[0], C4[0] = 0
Process Time Step t-2
Input: X[t-2] enters the network
Layer 1: LSTM1 processes X[t-2] with zero initial states โ produces h1[t-2],
C1[t-2]
Layer 2: LSTM2 takes h1[t-2] as input โ produces h2[t-2], C2[t-2]
Layer 3: LSTM3 takes h2[t-2] as input โ produces h3[t-2], C3[t-2]
Layer 4: LSTM4 takes h3[t-2] as input โ produces h4[t-2], C4[t-2]
Process Time Step t-1
Input: X[t-1] enters the network (new independent input)
Layer 1: LSTM1 processes X[t-1] WITH memory from h1[t-2], C1[t-2] โ produces
h1[t-1], C1[t-1]
Layer 2: LSTM2 takes h1[t-1] WITH memory from h2[t-2], C2[t-2] โ produces h2[t-1],
C2[t-1]
Layer 3: LSTM3 takes h2[t-1] WITH memory from h3[t-2], C3[t-2] โ produces h3[t-1],
C3[t-1]
Layer 4: LSTM4 takes h3[t-1] WITH memory from h4[t-2], C4[t-2] โ produces h4[t-1],
C4[t-1]
Process Time Step t (Final)
Input: X[t] enters the network (most recent data)
Layer 1: LSTM1 processes X[t] WITH memory from h1[t-1], C1[t-1] โ produces h1[t],
C1[t]
Layer 2: LSTM2 takes h1[t] WITH memory from h2[t-1], C2[t-1] โ produces h2[t],
C2[t]
Layer 3: LSTM3 takes h2[t] WITH memory from h3[t-1], C3[t-1] โ produces h3[t],
C3[t]
Layer 4: LSTM4 takes h3[t] WITH memory from h4[t-1], C4[t-1] โ produces h4[t],
C4[t]
Generate Final Prediction
Dense Layer: Takes h4[t] (which contains information from all 3 time steps and 4
layers)
Output: ลท[t+1] = W_output ร h4[t] + b_output
Result: Prediction for the next time step