· AI Engineering  · 5 min read

Reaching AGI by Using the Human Feedback Loop

Human-style iteration gives LLMs a path to reliable AGI by pairing clear goals with self-measured loss functions.

Human-style iteration gives LLMs a path to reliable AGI by pairing clear goals with self-measured loss functions.

Recently I have started seeing AGI is closer than we think. We already have all the ingredients for it. Current frontier LLM capabilities are nearly sufficient. We just need some incremental improvements and better coordination of processes, then we will have AGI.

My definition of AGI is that it can do things that a normal person with an IQ of 100 can do, with similar quality. We set aside genius things like inventing relativity theory.

Let’s start with the simple loop humans use, then we will project it to LLMs.

How humans work

How does a normal human do things, like play a piano song, write a PowerPoint presentation? We do it iteratively. We try it the first time. We may be happy with it, but often not. Then we iterate, tweak it, improve it until we are happy with our play or our presentation.

We can do that thanks to our mechanism to solve things. Given a problem, we are able to:

  1. Define what a desired or good-enough outcome is,
  2. Produce a solution, although it may be quite bad,
  3. Evaluate a solution, how far it is from the outcome, and what direction to improve toward the outcome.

You see this loop in everyday jobs. For example, a sales representative with a quarterly sales target would first have a strategy to hit the target. After a month she does a retrospective, evaluates and sees what is good, what is bad, and modifies her strategy in the next month, and the next month. At the end of the quarter she may hit or miss the target, but that is how humans solve things. Or even highly intelligent things like inventing a light bulb, we try multiple times, each time we do it a bit differently and see what sticks, what doesn’t.

What LLMs can learn from humans

Step 1 and Step 2 are mostly fine today for frontier LLMs, Step 3 is the tricky one. To define what is a desired or good-enough outcome (Step 1), normally a human stakeholder is the one who does that, or they can leave it to the LLM to decide. LLMs are also capable of proposing an initial or incremental solution (Step 2).

For Step 3, to evaluate effectively and know what to do next, an LLM needs to be able to build for itself two things:

  • (a) a clear, measurable “loss function” based on the desired outcome.
  • (b) a mechanism to measure the loss of a solution.

Lower loss means closer to the goal, this is like gradient descent in machine learning.

Having (a) and (b), an LLM then can confidently iterate toward the desired outcome just like humans do. Similar to humans, it may never reach that outcome. Executing that to reach a similar outcome to a normal human is already sufficient to put AI on par with human intelligence, and so be considered as AGI.

Now, following that process, we just break down a very hard problem that an LLM can’t solve directly now (hit quarterly sales target), into two much easier problems: (a) a loss function, and (b) a measurement mechanism. Both of them ideally should be defined following common sense or what humans usually do.

Back to the sales example:

  • Loss function: number of sales/leads/inquiries generated from each channel, each type of action
  • Measurement mechanism: implement a tracking system to attribute which action leads to which result.

Challenges

Some challenges remain:

Defining (a) and (b) in messy jobs. Although (a) and (b) are way easier than solving the problem directly, it is still not always clear how an LLM could define them, eg. for the job of mechanical engineers, or retail workers. There are currently efforts toward that from big AI labs. For example, on 25 Sep 2025, OpenAI released GDPval, a dataset to measure the performance of LLM models on real-world tasks across 44 occupations.

Context window limits. Another technical constraint is the context window: how long AI can see its history and get insight from it. When iterating for too long, LLMs will have too large a context and their performance starts to degrade, and they may repeat mistakes they made and get stuck without further improvement. But I would not worry much about this as more research in the LLM space would solve this problem one way or another.

Physical work. Besides, here I mostly constrain AGI to only digital work. For physical work, putting the AI into a capable humanoid robot body would be able to reach human level if we allow a couple more years to accumulate further advancements in robotics.

Delayed outcome. In the case that it can’t be measure immediately how far a solution from the desired outcome (like trading stock), then human face the same problem anyway. So AI has no disadvantage here.

Conclusion

With a less ambitious definition of AGI, I can see a clear path toward it. We already have all ingredients, no need of a big breakthrough. We only need some time and human ingenuity to reach it. When this loop works end-to-end on a few common jobs, most people will call it AGI.

Back to Posts

Related Posts

View all posts »