2024-12-17 at 16:09
December 24, 2024•98 words
why is everyone being weird about scaling test-time compute vs training compute
https://x.com/ClementDelangue/status/1868740932251844806
or maybe thats just tweakers on AI twitter and not ppl in actual research communities
===
also im curious how the phenomenon of "for every 10x increase in training compute, we decrease 15x in inference compute" maps onto humans
re:
https://yellow-apartment-148.notion.site/AI-Search-The-Bitter-er-Lesson-44c11acd27294f4495c3de778cd09c8d
"Moreover, the brilliant Scaling Scaling Laws with Board Games show that “for each additional 10× of train-time compute, about 15× of test-time compute can be eliminated” even down to single-neuron models. Recall that Stockfish beat Leela with a model 3 orders of magnitude smaller."