blog - page 3 | Ziming Liu

When does Muon work? Model depth is a key factor

3 min read · March 11, 2026

2026 · Physics-of-AI Optimization · AI
MOE 1 -- Experssive power of MOEs through the lens of spectral bias and memorization capacity

4 min read · March 10, 2026

2026 · Physics-of-AI MLP MOE Memorization Spectral-bias · AI
A toy model of video generative models -- bottleneck dimension controls "classical"/"quantum" strategies

7 min read · March 09, 2026

2026 · Physics-of-AI Representation Diffusion Flow-matching Autoencoder Video-generative-model · AI
Memory 2 -- How many bits does each parameter store? An analysis of MLP

8 min read · March 06, 2026

2026 · Physics-of-AI Memory Activation-function · AI
Sparse attention 8 -- Numeric randomness speeds up emergence of symbolic structure (induction head)

4 min read · March 05, 2026

2026 · Physics-of-AI Sparse-attention Symbolic · AI