Detailed RNN Architecture

🏗️ Architecture Overview

t-2 (Time Step 1)

Input Layer

Input: X[t-2]
Shape: (batch, features)
Example: [temp, humidity, day_of_week, ...]

→

↓

LSTM Layer 1

h1[t-2] = LSTM1(X[t-2], h1[t-3], C1[t-3])

Output: h1[t-2], C1[t-2]
Hidden dim: 64
Gates: forget, input, output

→

↓

LSTM Layer 2

h2[t-2] = LSTM2(h1[t-2], h2[t-3], C2[t-3])

Input: h1[t-2] (64 dim)
Output: h2[t-2], C2[t-2]
Hidden dim: 64

→

↓

LSTM Layer 3

h3[t-2] = LSTM3(h2[t-2], h3[t-3], C3[t-3])

Input: h2[t-2] (64 dim)
Output: h3[t-2], C3[t-2]
Hidden dim: 64

→

↓

LSTM Layer 4

h4[t-2] = LSTM4(h3[t-2], h4[t-3], C4[t-3])

Input: h3[t-2] (64 dim)
Output: h4[t-2], C4[t-2]
Hidden dim: 64

→

t-1 (Time Step 2)

Input Layer

Input: X[t-1]
Shape: (batch, features)
Note: Fresh input data
Independent: Not connected to previous inputs

→

↓

LSTM Layer 1

h1[t-1] = LSTM1(X[t-1], h1[t-2], C1[t-2])

Uses memory: h1[t-2], C1[t-2]
Current input: X[t-1]
Output: h1[t-1], C1[t-1]

→

↓

LSTM Layer 2

h2[t-1] = LSTM2(h1[t-1], h2[t-2], C2[t-2])

Uses memory: h2[t-2], C2[t-2]
Current input: h1[t-1]
Output: h2[t-1], C2[t-1]

→

↓

LSTM Layer 3

h3[t-1] = LSTM3(h2[t-1], h3[t-2], C3[t-2])

Uses memory: h3[t-2], C3[t-2]
Current input: h2[t-1]
Output: h3[t-1], C3[t-1]

→

↓

LSTM Layer 4

h4[t-1] = LSTM4(h3[t-1], h4[t-2], C4[t-2])

Uses memory: h4[t-2], C4[t-2]
Current input: h3[t-1]
Output: h4[t-1], C4[t-1]

→

t (Time Step 3)

Input Layer

Input: X[t]
Shape: (batch, features)
Note: Most recent data
Final input: No horizontal connection

↓

LSTM Layer 1

h1[t] = LSTM1(X[t], h1[t-1], C1[t-1])

Uses memory: h1[t-1], C1[t-1]
Current input: X[t]
Final state: h1[t], C1[t]

↓

LSTM Layer 2

h2[t] = LSTM2(h1[t], h2[t-1], C2[t-1])

Uses memory: h2[t-1], C2[t-1]
Current input: h1[t]
Final state: h2[t], C2[t]

↓

LSTM Layer 3

h3[t] = LSTM3(h2[t], h3[t-1], C3[t-1])

Uses memory: h3[t-1], C3[t-1]
Current input: h2[t]
Final state: h3[t], C3[t]

↓

LSTM Layer 4

h4[t] = LSTM4(h3[t], h4[t-1], C4[t-1])

Uses memory: h4[t-1], C4[t-1]
Current input: h3[t]
Contains: All sequence information

↓

Output Layer (Dense)

ŷ[t+1] = Dense(h4[t]) + bias

Input: h4[t] (64 dim)
Output: Prediction
Activation: Linear/ReLU

🔬 Detailed Mathematical Flow

LSTM Cell Computation (Each Layer)

Forget Gate: f[t] = σ(W_f · [h[t-1], x[t]] + b_f)

Input Gate: i[t] = σ(W_i · [h[t-1], x[t]] + b_i)

Candidate Values: C̃[t] = tanh(W_C · [h[t-1], x[t]] + b_C)

Cell State: C[t] = f[t] * C[t-1] + i[t] * C̃[t]

Output Gate: o[t] = σ(W_o · [h[t-1], x[t]] + b_o)

Hidden State: h[t] = o[t] * tanh(C[t])

📊 Tensor Dimensions Throughout Network

Component	Time t-2	Time t-1	Time t	Notes
Input X	[batch, features]	[batch, features]	[batch, features]	Raw input features
LSTM Layer 1	[batch, 64]	[batch, 64]	[batch, 64]	Hidden state h1[t]
LSTM Layer 2	[batch, 64]	[batch, 64]	[batch, 64]	Hidden state h2[t]
LSTM Layer 3	[batch, 64]	[batch, 64]	[batch, 64]	Hidden state h3[t]
LSTM Layer 4	[batch, 64]	[batch, 64]	[batch, 64]	Hidden state h4[t]
Cell States	[batch, 64]	[batch, 64]	[batch, 64]	Long-term memory C[t]
Output	-	-	[batch, output_dim]	Final prediction

⚡ Step-by-Step Execution Process

Initialize Hidden and Cell States

All LSTM layers start with zero hidden states: h1[0], h2[0], h3[0], h4[0] = 0
All cell states start as zeros: C1[0], C2[0], C3[0], C4[0] = 0

Process Time Step t-2

Input: X[t-2] enters the network
Layer 1: LSTM1 processes X[t-2] with zero initial states → produces h1[t-2], C1[t-2]
Layer 2: LSTM2 takes h1[t-2] as input → produces h2[t-2], C2[t-2]
Layer 3: LSTM3 takes h2[t-2] as input → produces h3[t-2], C3[t-2]
Layer 4: LSTM4 takes h3[t-2] as input → produces h4[t-2], C4[t-2]

Process Time Step t-1

Input: X[t-1] enters the network (new independent input)
Layer 1: LSTM1 processes X[t-1] WITH memory from h1[t-2], C1[t-2] → produces h1[t-1], C1[t-1]
Layer 2: LSTM2 takes h1[t-1] WITH memory from h2[t-2], C2[t-2] → produces h2[t-1], C2[t-1]
Layer 3: LSTM3 takes h2[t-1] WITH memory from h3[t-2], C3[t-2] → produces h3[t-1], C3[t-1]
Layer 4: LSTM4 takes h3[t-1] WITH memory from h4[t-2], C4[t-2] → produces h4[t-1], C4[t-1]

Process Time Step t (Final)

Input: X[t] enters the network (most recent data)
Layer 1: LSTM1 processes X[t] WITH memory from h1[t-1], C1[t-1] → produces h1[t], C1[t]
Layer 2: LSTM2 takes h1[t] WITH memory from h2[t-1], C2[t-1] → produces h2[t], C2[t]
Layer 3: LSTM3 takes h2[t] WITH memory from h3[t-1], C3[t-1] → produces h3[t], C3[t]
Layer 4: LSTM4 takes h3[t] WITH memory from h4[t-1], C4[t-1] → produces h4[t], C4[t]

Generate Final Prediction

Dense Layer: Takes h4[t] (which contains information from all 3 time steps and 4 layers)
Output: ŷ[t+1] = W_output × h4[t] + b_output
Result: Prediction for the next time step

Complete 4-Layer RNN Architecture

🏗️ Architecture Overview

🔬 Detailed Mathematical Flow

📊 Tensor Dimensions Throughout Network

⚡ Step-by-Step Execution Process

Initialize Hidden and Cell States

Process Time Step t-2

Process Time Step t-1

Process Time Step t (Final)

Generate Final Prediction

🧠 Information Integration Across Layers