Sparse-attention | Ziming Liu

Feb 24, 2026	181-parameter transformer-like models for 10-digit addition
Jan 31, 2026	Sparse attention 7 -- Stack of causal attention creates implicit positional embedding, and explaning "Loss in the middle"
Jan 30, 2026	Sparse attention 6 -- In-context Associative recall
Jan 22, 2026	Sparse attention 5 -- Attention sink
Jan 13, 2026	Sparse attention 4 -- previous token head
Jan 12, 2026	Sparse attention 3 -- inefficiency of extracting similar content
Jan 10, 2026	Sparse attention 2 -- Unattention head, branching dynamics
Jan 09, 2026	Sparse attention 1 -- sticky plateau and rank collapse