Complete 4-Layer RNN Architecture

Detailed Mathematical and Computational Flow Analysis

๐Ÿ—๏ธ Architecture Overview

t-2 (Time Step 1)
Input Layer
Input: X[t-2]
Shape: (batch, features)
Example: [temp, humidity, day_of_week, ...]
โ†’
โ†“
LSTM Layer 1
h1[t-2] = LSTM1(X[t-2], h1[t-3], C1[t-3])
Output: h1[t-2], C1[t-2]
Hidden dim: 64
Gates: forget, input, output
โ†’
โ†“
LSTM Layer 2
h2[t-2] = LSTM2(h1[t-2], h2[t-3], C2[t-3])
Input: h1[t-2] (64 dim)
Output: h2[t-2], C2[t-2]
Hidden dim: 64
โ†’
โ†“
LSTM Layer 3
h3[t-2] = LSTM3(h2[t-2], h3[t-3], C3[t-3])
Input: h2[t-2] (64 dim)
Output: h3[t-2], C3[t-2]
Hidden dim: 64
โ†’
โ†“
LSTM Layer 4
h4[t-2] = LSTM4(h3[t-2], h4[t-3], C4[t-3])
Input: h3[t-2] (64 dim)
Output: h4[t-2], C4[t-2]
Hidden dim: 64
โ†’
t-1 (Time Step 2)
Input Layer
Input: X[t-1]
Shape: (batch, features)
Note: Fresh input data
Independent: Not connected to previous inputs
โ†’
โ†“
LSTM Layer 1
h1[t-1] = LSTM1(X[t-1], h1[t-2], C1[t-2])
Uses memory: h1[t-2], C1[t-2]
Current input: X[t-1]
Output: h1[t-1], C1[t-1]
โ†’
โ†“
LSTM Layer 2
h2[t-1] = LSTM2(h1[t-1], h2[t-2], C2[t-2])
Uses memory: h2[t-2], C2[t-2]
Current input: h1[t-1]
Output: h2[t-1], C2[t-1]
โ†’
โ†“
LSTM Layer 3
h3[t-1] = LSTM3(h2[t-1], h3[t-2], C3[t-2])
Uses memory: h3[t-2], C3[t-2]
Current input: h2[t-1]
Output: h3[t-1], C3[t-1]
โ†’
โ†“
LSTM Layer 4
h4[t-1] = LSTM4(h3[t-1], h4[t-2], C4[t-2])
Uses memory: h4[t-2], C4[t-2]
Current input: h3[t-1]
Output: h4[t-1], C4[t-1]
โ†’
t (Time Step 3)
Input Layer
Input: X[t]
Shape: (batch, features)
Note: Most recent data
Final input: No horizontal connection
โ†“
LSTM Layer 1
h1[t] = LSTM1(X[t], h1[t-1], C1[t-1])
Uses memory: h1[t-1], C1[t-1]
Current input: X[t]
Final state: h1[t], C1[t]
โ†“
LSTM Layer 2
h2[t] = LSTM2(h1[t], h2[t-1], C2[t-1])
Uses memory: h2[t-1], C2[t-1]
Current input: h1[t]
Final state: h2[t], C2[t]
โ†“
LSTM Layer 3
h3[t] = LSTM3(h2[t], h3[t-1], C3[t-1])
Uses memory: h3[t-1], C3[t-1]
Current input: h2[t]
Final state: h3[t], C3[t]
โ†“
LSTM Layer 4
h4[t] = LSTM4(h3[t], h4[t-1], C4[t-1])
Uses memory: h4[t-1], C4[t-1]
Current input: h3[t]
Contains: All sequence information
โ†“
Output Layer (Dense)
ลท[t+1] = Dense(h4[t]) + bias
Input: h4[t] (64 dim)
Output: Prediction
Activation: Linear/ReLU

๐Ÿ”ฌ Detailed Mathematical Flow

LSTM Cell Computation (Each Layer)
Forget Gate: f[t] = ฯƒ(W_f ยท [h[t-1], x[t]] + b_f)
Input Gate: i[t] = ฯƒ(W_i ยท [h[t-1], x[t]] + b_i)
Candidate Values: Cฬƒ[t] = tanh(W_C ยท [h[t-1], x[t]] + b_C)
Cell State: C[t] = f[t] * C[t-1] + i[t] * Cฬƒ[t]
Output Gate: o[t] = ฯƒ(W_o ยท [h[t-1], x[t]] + b_o)
Hidden State: h[t] = o[t] * tanh(C[t])

๐Ÿ“Š Tensor Dimensions Throughout Network

Component Time t-2 Time t-1 Time t Notes
Input X [batch, features] [batch, features] [batch, features] Raw input features
LSTM Layer 1 [batch, 64] [batch, 64] [batch, 64] Hidden state h1[t]
LSTM Layer 2 [batch, 64] [batch, 64] [batch, 64] Hidden state h2[t]
LSTM Layer 3 [batch, 64] [batch, 64] [batch, 64] Hidden state h3[t]
LSTM Layer 4 [batch, 64] [batch, 64] [batch, 64] Hidden state h4[t]
Cell States [batch, 64] [batch, 64] [batch, 64] Long-term memory C[t]
Output - - [batch, output_dim] Final prediction

โšก Step-by-Step Execution Process

Initialize Hidden and Cell States

All LSTM layers start with zero hidden states: h1[0], h2[0], h3[0], h4[0] = 0
All cell states start as zeros: C1[0], C2[0], C3[0], C4[0] = 0

Process Time Step t-2

Input: X[t-2] enters the network
Layer 1: LSTM1 processes X[t-2] with zero initial states โ†’ produces h1[t-2], C1[t-2]
Layer 2: LSTM2 takes h1[t-2] as input โ†’ produces h2[t-2], C2[t-2]
Layer 3: LSTM3 takes h2[t-2] as input โ†’ produces h3[t-2], C3[t-2]
Layer 4: LSTM4 takes h3[t-2] as input โ†’ produces h4[t-2], C4[t-2]

Process Time Step t-1

Input: X[t-1] enters the network (new independent input)
Layer 1: LSTM1 processes X[t-1] WITH memory from h1[t-2], C1[t-2] โ†’ produces h1[t-1], C1[t-1]
Layer 2: LSTM2 takes h1[t-1] WITH memory from h2[t-2], C2[t-2] โ†’ produces h2[t-1], C2[t-1]
Layer 3: LSTM3 takes h2[t-1] WITH memory from h3[t-2], C3[t-2] โ†’ produces h3[t-1], C3[t-1]
Layer 4: LSTM4 takes h3[t-1] WITH memory from h4[t-2], C4[t-2] โ†’ produces h4[t-1], C4[t-1]

Process Time Step t (Final)

Input: X[t] enters the network (most recent data)
Layer 1: LSTM1 processes X[t] WITH memory from h1[t-1], C1[t-1] โ†’ produces h1[t], C1[t]
Layer 2: LSTM2 takes h1[t] WITH memory from h2[t-1], C2[t-1] โ†’ produces h2[t], C2[t]
Layer 3: LSTM3 takes h2[t] WITH memory from h3[t-1], C3[t-1] โ†’ produces h3[t], C3[t]
Layer 4: LSTM4 takes h3[t] WITH memory from h4[t-1], C4[t-1] โ†’ produces h4[t], C4[t]

Generate Final Prediction

Dense Layer: Takes h4[t] (which contains information from all 3 time steps and 4 layers)
Output: ลท[t+1] = W_output ร— h4[t] + b_output
Result: Prediction for the next time step

๐Ÿง  Information Integration Across Layers