You Can't Eliminate LLM Hallucinations

People often complain about hallucinations of LLMs, i.e. they make up things and present them as fact. OpenAI just released a paper “Why Language Models Hallucinate”, where they argue hallucinations need not be mysterious - they originate simply as errors in binary classification, and propose some methods to control them.

Some people think this finding finally puts an end to hallucinations, and we never have to worry about LLMs making up things again. The truth is far from that. You can never truly eliminate hallucinations. It’s a trade off between false positives and false negatives.

Under the hood, an LLM is a giant math function, with billions of parameters. It takes an input text and returns an output text. It can’t be 100% certain that what it says is correct. It can only assign a confidence score, e.g. “I’m 99.3% confident that it is correct.” Now it all boils down to a threshold you choose to cut off at 98.0%, 99.5%, or 99.9%. That’s where the trade off comes in. If you choose the threshold too high, the LLM will mostly say “I don’t know.” If you choose it too low, more hallucinations will creep in.

Humans behave the same. We cannot know for sure what is in our mind is right or wrong. Some people always say what they think, no matter right or wrong. Some only want to say the right things, and end up never saying anything. Neither way is good.

How does an LLM calculate a confidence score? For the next token, it might put “blue” at 99.7%, “red” at 0.2%, “yellow” at 0.1%. It will pick “blue” with confidence 99.7%. That confidence number can be wrong, as token probabilities are the model’s belief over text, not ground truth. So uncertainty on top of uncertainty.

How can we reduce hallucinations? Last month OpenAI released GPT-5 and lowered hallucinations by 6-fold compared to previous models. The new paper also outlines several methods to further control hallucinations, including reinforcement learning based post-training, fine-tuning on new information, and retrieval and search augmentation. Evaluation should also reward saying “I don’t know” when unsure. In the long run, by making the model more knowledgeable and capable, it’ll be more confident with each of its answers. Then the “blue” token above will have 99.97% confidence instead of 99.7%, so we can naturally raise the threshold.

Someday, we’ll reduce hallucinations to say 0.0001%, and we’ll be happy with that. We accept a tiny risk, like other things in life. In the meantime, balancing the trade off is necessary.

You Can't Eliminate LLM Hallucinations

Related Posts

Periodic Labs: A star is born

Under the Hood of AI Models: A Simple Guide for Non-Technical Readers

Which Deep Research Is Best? ChatGPT, Claude, or Gemini

BrowserOS: Free Agentic Browser That Works