-
How to ground your ideas?
-
A toy model of distillation
-
When does Muon work? Model depth is a key factor
-
MOE 1 -- Experssive power of MOEs through the lens of spectral bias and memorization capacity
-
A toy model of video generative models -- bottleneck dimension controls "classical"/"quantum" strategies