Ziming Liu
  • about
  • blog
  • publications
  • media coverage
  • talk
  • Sparse attention 2 -- Unattention head, branching dynamics

    6 min read   ·   January 10, 2026

    2026   ·   Physics-of-AI     ·   AI  

  • Sparse attention 1 -- sticky plateau and rank collapse

    6 min read   ·   January 09, 2026

    2026   ·   Physics-of-AI     ·   AI  

  • Unigram toy model is surprisingly rich -- representation collapse, scaling laws, learning rate schedule

    10 min read   ·   January 08, 2026

    2026   ·   Physics-of-AI     ·   AI  

  • Fine-tuning with sparse updates? A toy teacher-student Setup

    6 min read   ·   January 07, 2026

    2026   ·   Physics-of-AI,   New-Model     ·   AI  

  • Multi-Head Cross Entropy Loss

    5 min read   ·   January 06, 2026

    2026   ·   Physics-of-AI,   New-Model     ·   AI  

  • Newer
  • 4
  • 5
  • 6
  • 7
  • 8
  • Older
© Copyright 2026 Ziming Liu. Powered by Jekyll with al-folio theme. Hosted by GitHub Pages. Photos from Unsplash.