Optimization
an archive of posts with this tag
| Mar 18, 2026 | Attention residual 2 |
|---|---|
| Mar 16, 2026 | When does Kimi's "Attention Residuals" work? |
| Mar 15, 2026 | When does RandOpt work? |
| Mar 12, 2026 | A toy model of distillation |
| Mar 11, 2026 | When does Muon work? Model depth is a key factor |
| Jan 27, 2026 | Optimization 4 -- Loss Spikes |
| Jan 25, 2026 | Optimization 3 / Depth 2 -- Adding Bias After ReLU |
| Jan 24, 2026 | Optimization 2 -- Elementwise Scale Reparametrization |
| Jan 23, 2026 | Optimization 1 -- Norm reparametrization |