· Quick Take · 2 min read
LLMs for Research: A Faster Path to AGI
Building AI that excels at research with AlphaGo-style self-improvement might reach AGI faster than solving general intelligence directly.

AI labs are racing toward AGI. The term is fuzzy, but people mostly agree it is when AI can do any intellectual task a human can do. But that may be overshooting. I think there is a faster path, and maybe a higher chance to succeed: build an LLM that can do advanced science better than many people. That is it; not AGI, just better than humans at academic research. If a lab builds that and uses it for LLM research, it could speed up its own progress, have breakthrough after breakthrough, and create a virtuous cycle. That may beat a one shot push to broad AGI.
However, I do not think the current direction of LLMs, most-probable next token prediction, is enough to reach that level. It is very likely to run down the usual, conventional thinking paths. Novel results need novel ideas and novel ways of thinking, while next token prediction pulls toward the familiarity. So a possible direction is systems that mix LLMs with reinforcement learning like in AlphaGo Zero.
AlphaGo Zero was nothing short of magical. All it had were the rules of Go and a goal. It learned by itself with reinforcement learning. It competed with itself. It reached a level of Go that humans had never reached before. That feels like the right direction. However, real life does not have a clear reward system for reinforcement learning to perform well. That is why we cannot replicate AlphaGo Zero to our real life problems yet.
Academic research is a lot more difficult than the game of Go, but parts of it can have a more deterministic reward system than real life. For example, in certain areas we can verify a theorem, or run tests to check if a piece of code is correct. This means applying reinforcement learning to train an LLM in the AlphaGo Zero style to do academic research may be possible.
That seems to me a safer bet right now to reach AGI than just throwing more data into training, scaling the model size, or just letting the LLM think longer.