I’ve spent my profession swimming in information — as former Chief Knowledge Officer at Kaiser Permanente, UnitedHealthcare, and Optum — and at one level, I had oversight of almost 70% all of America’s healthcare claims. So once I let you know the issue with enterprise AI isn’t the mannequin structure however the information that fashions are being fed, imagine me: I’ve seen it firsthand.
LLMs are already peaking
The cracks are already exhibiting in LLMs. Take GPT-5. Its launch was plagued with complaints: it failed primary math, missed context that earlier variations dealt with with ease, and left paying clients calling it “bland” and “generic.” OpenAI even needed to restore an older mannequin after customers rejected its colder, checklist-driven tone. After two years of delays, many began asking if OpenAI had misplaced its edge — or if the complete LLM strategy was merely hitting a wall.
Meta’s LLaMA 4 tells the same story. In long-context exams — the sort of work enterprises really want — Maverick confirmed no enchancment over LLaMA 3, and Scout carried out “downright atrociously.” Meta claimed these fashions might deal with tens of millions of tokens; in actuality, they struggled with simply 128,000. In the meantime, Google’s Gemini sailed previous 90% accuracy on the similar scale.
The info drawback nobody desires to confess
As a substitute of confronting the boundaries we’re already seeing with LLMs, the business retains scaling up — pouring extra compute and electrical energy into these fashions. And but, regardless of all that energy, the outcomes aren’t getting any smarter.
The reason being easy: the web information these fashions are constructed on has already been scraped, cleaned, and retrained again and again to loss of life. That’s why new releases really feel flat — there’s little new to be taught. Each cycle simply recycles the identical patterns again into the mannequin. They’ve already eaten the web. Now they’re ravenous on themselves.
In the meantime, the actual gold mine of intelligence — non-public enterprise information — sits locked away. LLMs aren’t failing for lack of knowledge — they’re failing as a result of they don’t use the suitable information. Take into consideration what’s wanted in healthcare: claims, medical data, scientific notes, billing, invoices, prior authorization requests, name heart transcripts — the data that really displays how companies and industries are run.
Till fashions can practice on that sort of information, they’ll at all times run out of gas. You may stack parameters, add GPUs, and pour electrical energy into larger and greater fashions, but it surely received’t make them smarter.
Small language fashions are the longer term
The way in which ahead isn’t larger fashions. It’s smaller, smarter ones. Small Language Fashions (SLMs) are designed to do what LLMs can’t: be taught from enterprise information and give attention to particular issues.
Right here’s why they work.
First, they’re environment friendly. SLMs have fewer parameters, which suggests decrease compute prices and sooner response occasions. You don’t want an information heart stuffed with GPUs simply to get them working.
Second, they’re domain-specific. As a substitute of attempting to reply each query on the web, they’re skilled to do one factor nicely — like HCC threat coding, prior authorizations, or medical coding. That’s why they ship accuracy in locations the place generic LLMs stumble.
Third, they match enterprise workflows. They don’t sit on the skin as a shiny demo. They combine with the information that really drives your small business —billing information invoices, claims, scientific notes — they usually do it with governance and compliance in thoughts.
The long run isn’t larger — it’s smaller
I’ve seen this film earlier than: huge investments, infinite hype, after which the conclusion that scale alone doesn’t clear up the issue.
The way in which ahead is to repair the information drawback and construct smaller, smarter fashions that be taught from the data enterprises already personal. That’s the way you make AI helpful — not by chasing dimension for its personal sake. And I’m not the one one saying it. Even NVIDIA’s personal researchers now say the way forward for agentic AI belongs to small language fashions.
The business can hold throwing GPUs at ever-larger fashions, or it might construct higher ones that really work. The selection is apparent.
Picture: J Studios, Getty Photos

Fawad Butt is the co-founder and CEO of Penguin Ai. He beforehand served because the Chief Knowledge Officer at Kaiser Permanente, UnitedHealthcare Group, and Optum, main the business’s largest staff of knowledge and analytics consultants and managing a multi-hundred-million greenback P&L.
This submit seems via the MedCity Influencers program. Anybody can publish their perspective on enterprise and innovation in healthcare on MedCity Information via MedCity Influencers. Click on right here to learn the way.

