blog - page 7 | Ziming Liu

Sparse attention 7 -- Stack of causal attention creates implicit positional embedding, and explaning "Loss in the middle"

5 min read · January 31, 2026

2026 · Physics-of-AI Sparse-attention · AI
Sparse attention 6 -- In-context Associative recall

5 min read · January 30, 2026

2026 · Physics-of-AI Sparse-attention · AI
MLP 2 -- Effective linearity, Generalized SiLU

4 min read · January 29, 2026

2026 · Physics-of-AI MLP Non-linearity Activation-function · AI
MLP 1 -- Gating is good for polynomials

3 min read · January 28, 2026

2026 · Physics-of-AI MLP Non-linearity · AI
Optimization 4 -- Loss Spikes

6 min read · January 27, 2026

2026 · Physics-of-AI Optimization · AI