LLMs for Research: A Faster Path to AGI

AI labs are racing toward AGI. The term is fuzzy, but people mostly agree it is when AI can do any intellectual task a human can do. But that’s overshooting. There’s a faster path with a higher chance of success: build an LLM that can do advanced science better than many people. That is it; not AGI, just better than humans at academic research. If a lab builds that and uses it for LLM research, it could speed up its own progress, have breakthrough after breakthrough, and create a virtuous cycle. That may beat a one shot push to broad AGI.

However, I don’t think the current direction of LLMs, most-probable next token prediction, is enough to reach that level. It will run down the usual, conventional thinking paths. Novel results need novel ideas and novel ways of thinking, while next token prediction pulls toward the familiarity. So a possible direction is systems that mix LLMs with reinforcement learning like in AlphaGo Zero.

AlphaGo Zero was nothing short of magical. All it had were the rules of Go and a goal. It learned by itself with reinforcement learning. It competed with itself. It reached a level of Go that humans had never reached before. That feels like the right direction. However, real life does not have a clear reward system for reinforcement learning to perform well. That is why we cannot replicate AlphaGo Zero to our real life problems yet.

Academic research is a lot more difficult than the game of Go, but parts of it can have a more deterministic reward system than real life. For example, in certain areas we can verify a theorem, or run tests to check if a piece of code is correct. This means applying reinforcement learning to train an LLM in the AlphaGo Zero style to do academic research is possible.

That’s a safer bet right now to reach AGI than just throwing more data into training, scaling the model size, or just letting the LLM think longer.

LLMs for Research: A Faster Path to AGI

Related Posts

Orchard Robotics: Treat Every Tree Like a Pet With AI

Which Deep Research Is Best? ChatGPT, Claude, or Gemini

Voice-First Apps: Building for the Next Interface Shift

3 Observations from Google's Nano Banana Launch