Optimization
an archive of posts with this tag
| Mar 12, 2026 | A toy model of distillation |
|---|---|
| Mar 11, 2026 | When does Muon work? Model depth is a key factor |
| Jan 27, 2026 | Optimization 4 -- Loss Spikes |
| Jan 25, 2026 | Optimization 3 / Depth 2 -- Adding Bias After ReLU |
| Jan 24, 2026 | Optimization 2 -- Elementwise Scale Reparametrization |
| Jan 23, 2026 | Optimization 1 -- Norm reparametrization |