· AI Engineering  · 2 min read

Finetuning Small Models Will Matter in 2026

After years of relying on prompting and RAG, finetuning is back for certain use cases like new vocabularies or narrow tasks.

After years of relying on prompting and RAG, finetuning is back for certain use cases like new vocabularies or narrow tasks.

In previous years, finetuning an LLM was a luxury: too much work for too uncertain gain. People relied on simpler techniques, like prompting, RAG, or agentic search and achieved good outcomes. As these techniques were studied and applied extensively, 2026 is the right time to venture to the mysterious land of finetuning.

Two scenarios where finetuning has great expectations, and there were great success stories too.

  1. When you need a different vocabulary. All LLMs have standard vocabulary (the set of tokens an LLM uses to generate output), so it works well with normal conversations or usual tasks. For some specific tasks, that vocab just doesn’t work. One prime use case is science.

A couple of months ago, Google and Yale had a breakthrough in cancer treatment via finetuning Gemma 27B on ”> 50 million human+mouse single‑cell transcriptomes curated from Human Cell Atlas and CellxGene.” Finetuning is the only choice to introduce new tokens from specific domains into the vocab. I expect this technique can be applied very well to other scientific projects too.

  1. When you need to do a single narrow and specific task well. LLM is a general-purpose tool. When we only want to apply to a specific task, a finetuning smaller model could improve performance quality, or reduce cost, or both.

Late 2025, several AI labs released a series of small but state-of-the-art quality OCR (Optical Character Recognition) models: Deepseek, Paddle, olm. As there is strong demand for large-scale OCR, finetuning hit 3 targets at the same time for one-off training cost: more accurate, faster, cheaper.

I would expect there will be more and more success stories of finetuning like the two above this year.

Back to Posts

Related Posts

View all posts »
Working at the Technical Frontier

Working at the Technical Frontier

The best path to big results is working at the frontier of your domain, where technical breakthroughs create lasting business moats.

How LLMs Solve the Sloppy Input Problem

How LLMs Solve the Sloppy Input Problem

LLMs excel at making sense of messy, unstructured input. That shifts the burden of precision from people to systems. This capability unlocks massive opportunities in business.