-
Memory 2 -- How many bits does each parameter store? An analysis of MLP
-
Sparse attention 8 -- Numeric randomness speeds up emergence of symbolic structure (induction head)
-
Drifting VQ-VAE -- How "drifting models" fixe failure modes of VQ-VAE
-
Loss landscape visualization 1 -- Seeing sticky plateau
-
Research agent 1 -- Reproducing 2026-01-01 blog (physics of feature learning)