-
Physics 1 -- Attention can't exactly simulate uniform linear motion
-
Depth 4 -- Flat directions (in weight space) are high frequency modes (in function space)
-
Depth 3 -- Fun facts about loss hessian eigenvalues
-
Diffusion 2 -- Visualizing flow matching, temporal dynamics
-
Sparse attention 7 -- Stack of causal attention creates implicit positional embedding, and explaning "Loss in the middle"