Ziming Liu
  • about
  • blog
  • publications
  • media coverage
  • talk
  • When does Muon work? Model depth is a key factor

    3 min read   ·   March 11, 2026

    2026   ·   Physics-of-AI   Optimization     ·   AI  

  • MOE 1 -- Experssive power of MOEs through the lens of spectral bias and memorization capacity

    4 min read   ·   March 10, 2026

    2026   ·   Physics-of-AI   MLP   MOE   Memorization   Spectral-bias     ·   AI  

  • A toy model of video generative models -- bottleneck dimension controls "classical"/"quantum" strategies

    7 min read   ·   March 09, 2026

    2026   ·   Physics-of-AI   Representation   Diffusion   Flow-matching   Autoencoder   Video-generative-model     ·   AI  

  • Memory 2 -- How many bits does each parameter store? An analysis of MLP

    8 min read   ·   March 06, 2026

    2026   ·   Physics-of-AI   Memory   Activation-function     ·   AI  

  • Sparse attention 8 -- Numeric randomness speeds up emergence of symbolic structure (induction head)

    4 min read   ·   March 05, 2026

    2026   ·   Physics-of-AI   Sparse-attention   Symbolic     ·   AI  

  • Newer
  • 2
  • 3
  • 4
  • 5
  • 6
  • Older
© Copyright 2026 Ziming Liu. Powered by Jekyll with al-folio theme. Hosted by GitHub Pages. Photos from Unsplash.