AI

an archive of posts in this category

Jan 31, 2026 Sparse attention 7 -- Stack of causal attention creates implicit positional embedding, and explaning "Loss in the middle"
Jan 30, 2026 Sparse attention 6 -- In-context Associative recall
Jan 29, 2026 MLP 2 -- Effective linearity, Generalized SiLU
Jan 28, 2026 MLP 1 -- Gating is good for polynomials
Jan 27, 2026 Optimization 4 -- Loss Spikes
Jan 25, 2026 Optimization 3 / Depth 2 -- Adding Bias After ReLU
Jan 24, 2026 Optimization 2 -- Elementwise Scale Reparametrization
Jan 23, 2026 Optimization 1 -- Norm reparametrization
Jan 22, 2026 Sparse attention 5 -- Attention sink
Jan 21, 2026 Bigram 4 -- On the difficulty of spatial map emergence
Jan 20, 2026 Depth 1 -- Understanding Pre-LN and Post-LN
Jan 19, 2026 Bigram 3 -- Low Rank Structure
Jan 16, 2026 Bigram 2 -- Emergence of Hyperbolic Spaces
Jan 15, 2026 Bigram 1 -- Walk on a Circle
Jan 14, 2026 Diffusion 1 -- Sparse and Dense Neurons
Jan 13, 2026 Sparse attention 4 -- previous token head
Jan 12, 2026 Sparse attention 3 -- inefficiency of extracting similar content
Jan 11, 2026 Emergence of Induction Head Depends on Learning Rate Schedule
Jan 10, 2026 Sparse attention 2 -- Unattention head, branching dynamics
Jan 09, 2026 Sparse attention 1 -- sticky plateau and rank collapse
Jan 08, 2026 Unigram toy model is surprisingly rich -- representation collapse, scaling laws, learning rate schedule
Jan 07, 2026 Fine-tuning with sparse updates? A toy teacher-student Setup
Jan 06, 2026 Multi-Head Cross Entropy Loss
Jan 05, 2026 What's the difference -- (physics of) AI, physics, math and interpretability
Jan 04, 2026 Representation anisotropy from nonlinear functions
Jan 03, 2026 Training dynamics of A Single ReLU Neuron
Jan 02, 2026 Physics of AI – How to Begin
Jan 01, 2026 Physics of Feature Learning 1 – A Perspective from Nonlinearity
Dec 31, 2025 Physics of AI Requires Mindset Shifts
Dec 25, 2025 Achieving AGI Intelligently – Structure, Not Scale
May 27, 2024 Philosophical thoughts on Kolmogorov-Arnold Networks
Jun 16, 2023 A Good ML Theory is Like Physics -- A Physicist's Analysis of Grokking