blog - page 6 | Ziming Liu

181-parameter transformer-like models for 10-digit addition

5 min read · February 24, 2026

2026 · Physics-of-AI Toy-language Sparse-attention · AI
When should I use physics of AI?

9 min read · February 15, 2026

2026 · Physics-of-AI Methodology · AI
Memory 1 -- How much do linear layers memorize?

5 min read · February 09, 2026

2026 · Physics-of-AI Memory · AI
Transformers don't learn Newton's laws? They learn Kepler's laws!

5 min read · February 08, 2026

2026 · Physics-of-AI World-model · AI
When I say "toy models", what do I mean?

10 min read · February 07, 2026

2026 · Physics-of-AI Methodology · AI