Ziming Liu
  • about
  • blog
  • publications
  • media coverage
  • talk
  • Sparse attention 7 -- Stack of causal attention creates implicit positional embedding, and explaning "Loss in the middle"

    5 min read   ·   January 31, 2026

    2026   ·   Physics-of-AI   Sparse-attention     ·   AI  

  • Sparse attention 6 -- In-context Associative recall

    5 min read   ·   January 30, 2026

    2026   ·   Physics-of-AI   Sparse-attention     ·   AI  

  • MLP 2 -- Effective linearity, Generalized SiLU

    4 min read   ·   January 29, 2026

    2026   ·   Physics-of-AI   MLP   Non-linearity   Activation-function     ·   AI  

  • MLP 1 -- Gating is good for polynomials

    3 min read   ·   January 28, 2026

    2026   ·   Physics-of-AI   MLP   Non-linearity     ·   AI  

  • Optimization 4 -- Loss Spikes

    6 min read   ·   January 27, 2026

    2026   ·   Physics-of-AI   Optimization     ·   AI  

  • Newer
  • 6
  • 7
  • 8
  • 9
  • 10
  • Older
© Copyright 2026 Ziming Liu. Powered by Jekyll with al-folio theme. Hosted by GitHub Pages. Photos from Unsplash.