Depth
an archive of posts with this tag
| Mar 18, 2026 | Attention residual 2 |
|---|---|
| Mar 16, 2026 | When does Kimi's "Attention Residuals" work? |
| Feb 03, 2026 | Depth 4 -- Flat directions (in weight space) are high frequency modes (in function space) |
| Feb 02, 2026 | Depth 3 -- Fun facts about loss hessian eigenvalues |
| Jan 25, 2026 | Optimization 3 / Depth 2 -- Adding Bias After ReLU |
| Jan 20, 2026 | Depth 1 -- Understanding Pre-LN and Post-LN |