LLM pricing has never made sense

4/23/2026 at 2:33:09 PM

"But there’s another challenge: local LLMs. It’s already possible to run LLMs on local hardware, and that’s only going to get easier in the future. Apple’s M-series chips are extremely good at doing this today. Open weight (read: free) models are widely available and good enough that most people probably couldn’t tell the difference. They also have the benefits of running on hardware that’s sipping power most of the time, rather than slurping it down in massive data centres."

This is such an odd and illogical conclusion. If a smaller model can be sufficient (which is not something I would have said), that smaller model can be ran in a datacenter. The idea that a small model running at home is 'sipping' while that same small model in a datacenter is 'slurping' is absurd. The datacenter will have much greater overall efficiency in both power usage and total cost to implement. Of course if you compare a small home model to a DC frontier model the power usage is different, but so is the output.

by sponaugle

4/23/2026 at 2:58:04 PM

I’m beginning to challenge the assumption that datacenters are more efficient. I can get the same computing power out of a single Mac Mini 32 GB that I get from from an AWS virtual machine that costs hundreds of dollars per month. Even compared to cheap baremetal providers like Hetzner, the Mac Mini pays for itself in a few months of cloud costs. How exactly are datacenters more efficient? I don’t see it in the price. It may be the costs of centralizing large amounts of compute actually make it more expensive, not less, when accounting for profit margins, and considering the fact that base infrastructure (power, internet) is a given in every home anyway.

There are huge hidden costs in datacenter prices that are simply unnecessary for most casual users of compute. Salaries of staff to maintain datacenters, redundancy and high availability of nine 9s that are simply not required by most customers, as well as real estate costs are all non-existent costs in a homelab setup because those are living costs you pay for anyway, with or without a home server.

by znnajdla

4/23/2026 at 3:43:57 PM

>I can get the same computing power out of a single Mac Mini 32 GB that I get from from an AWS virtual machine that costs hundreds of dollars per month.

This quickly breaks down when you're talking about large models that needs terabytes of memory to run[1]. There's no way that you're going to be able to amortize that for a single person.

[1] https://apxml.com/models/glm-51

by gruez

4/23/2026 at 4:56:38 PM

The comment is about smaller models

by ipaddr

4/23/2026 at 5:10:27 PM

Right, but what are you going to do with small models? If your time is worth anything at all you'd pay for the $100 claude code/codex pro subscription, rather than fumbling around with the models quantized enough to fit on your mac.

by gruez

4/23/2026 at 6:00:33 PM

If you're building agentic processes (harnesses) for business processes local models are a great way to do that, while keeping your data, and any personal data, private.

If you're vibe coding a codex/claude subscription makes more sense as a more polished experience.

I don't vibe code, but I use self hosted models with codex for code review and snippet generation.

by mhitza

4/23/2026 at 5:19:03 PM

If small models keep improving for specific purposes and larger models have diminishing returns, then what?

E.g. I can see a world where you have a local model that is specialised just for producing code.

by ret32f

4/23/2026 at 2:52:02 PM

Author here. The reason I wrote that local hardware is "sipping power most of the time" is because most of the time it's not doing LLM-related work. If you're just using your local machine (or eventually maybe even your phone) to do local LLM tasks, you're not doing that all day.

I agree that data centres will be set up to be more efficient, but we're also going to need fewer of them if local LLMs take off. If that's true, overbuilding data centres is more revenue pressure for AI companies.

by GavinAnderegg

4/23/2026 at 4:04:12 PM

Electricity is more expensive at home than where data centers are built, batch inference is more efficient at GPU/TPU inference per watt, power supplies in data centers are more efficient than in average consumer devices, entire racks can be fully powered off when not in use vs. standby power consumption, and of course the investment in hardware is amortized across many users in data centers. It allows more people to have access to larger models than everyone buying an M3 Ultra.

The economy of scale that data centers have is actually a good thing economically and environmentally for many kinds of demand.

I think that the most capable models will continue to be in high demand across the market until at least "a datacenter of PhDs" level of capability. At that point I can see a transition to more local model use if affordable consumer hardware is available (for the median human on Earth). If that turns out to be true then the hyperscaling will plateau at the level allowing sustained commercial/industrial "PhD"-level demand which we aren't at yet (all providers are still struggling to meet current demands).

by benlivengood

4/24/2026 at 12:58:10 AM

What I was commenting on was the concept that a small model at home is somehow more efficient. To make a reasonable and fair comparison you would compare many people running a small model at home vs those same people using what would likely be a shared resource in a datacenter.

The core concept is that tokens/watt is tokens/watt ( for a given model of course ). A computer at home is actually less efficient overall because most of the time it is not doing tokens but still using a small footprint of power.

The revenue pressure is an interesting problem , but I suspect the actual demand math will be much more complicated.

I find local models interesting for sure, and run several on my own personal DGX cluster. I am however most certainly not power efficient!

by sponaugle

4/23/2026 at 2:43:20 PM

Fully agree with you, smaller model are great for some tasks but the security concern on injection prompts etc is what really makes it for me. Great to run offline tasks etc, but whenever interacting outside the local network I still run Claude or ChatGPT depending on the task

by Almured

4/23/2026 at 8:40:57 PM

It's technically odd and illogical, but practically probably correct and on-the-money, as the companies try to artificially create demand?

by jrm4

4/23/2026 at 3:21:15 PM

So recently I moved from a Anthropic model to a qwen 3.5 model running on my Mac to summarize ticket activity over 7 days. I used to do this manually with a colleague and it would take us a couple hours to go through. Opus took 58 seconds, and Qwen took 2.5 minutes. The quality of the qwen output was comparable, but the there was a 2.5x difference in time.

All that said I actually don’t think that matters much. I think we are dragging attention economy concepts in to ai responses, and it doesn’t matter. Both options saved me hours per week, and the difference between 3 and 1 minute may not be worth the additional cost.

Also there are times when the model output is much better with anthropic, but it’s not all the time. I think it becomes a question should we be using the best model for all questions?

by gpapilion

4/23/2026 at 4:23:01 PM

Out of curiosity, what size Qwen did you use, at what quantization?

by busfahrer

4/23/2026 at 4:30:07 PM

27b fp4.

by gpapilion

4/23/2026 at 7:11:05 PM

> I don’t think it’s crazy to believe that people will also be running local inference on their phones in the next 5 years.

How about now? https://apfel.franzai.com/ (iOS/MacOS, runs the 3B param model already bundled for Siri) https://github.com/alichherawalla/off-grid-mobile-ai (Android, runs ~7B models on flagship phones at 15-30tok/s)

Foundation model investment feels like the bubble around fiber optics circa 2001 - a great technology being pushed forward by a speculative mania as it seems like it'll be useful in some profitable way, but nobody's quite sure how.

by garethsprice

4/23/2026 at 2:20:10 PM

I have been talking about this with a colleague this morning. The 20$ option is just a trail version, I could not do any real work with.

And I wonder whether then subscription model is just a way to create a demand for API. For example, I’m building this portal with the support of an LLM for coding, but then I will need to have an LLM using API token to run the platform giving them additional revenue, a demand that did not exist without the coding I did with the subscription.

by Almured

4/23/2026 at 5:23:53 PM

If interference token costs are truly below api levels (excluding training costs), then a cheaper personal subscription & expensive api is an excellent price discrimination / marketing tool. Charging individual devs what they can pay and large corps what they can pay. Collect more revenue than a flat rate for both.

by qazxcvbnmlp

4/23/2026 at 5:39:59 PM

How are you going to price discriminate profitably when there is effectively no switching cost? Anthropic and OAI are also competing for revenue share in the same market segment.

by ret32f

4/23/2026 at 2:17:07 PM

I get the article and the take and I don't think you are wrong, but I would like you to further your thinking and come up with some improvements or fixes.

I get it you may not work in this industry or know the workings of how an AI company seeking frontier AGI WOULD operate but its helpful in connecting ideas and concepts by adding a proposed solution if for nothing more than to show the direction of your thinking.

Sure some people may talk smack about your idea but I've learned that the difference between someone who complains for the argument of complaints and those who complains to fix things have different forms of thinking. The latter may be wrong but its an indicator of HOW that person thinks which is always valuable.

Thanks for the blog.

by awedisee

4/23/2026 at 3:01:28 PM

Author here. I'm not complaining or trying to talk smack. I'm just pointing out something that seems to be coming: LLM price increases or the products being degraded to make more revenue. If I knew how to solve that, I'd be insanely wealthy.

by GavinAnderegg

4/23/2026 at 4:56:52 PM

Local LLM = Now & When, not If

by starkeeper

4/23/2026 at 5:58:00 PM

[dead]

by Till_Opel

4/23/2026 at 1:57:07 PM

[dead]

by vdelpuerto

4/23/2026 at 2:05:43 PM

[dead]

by getshiprelay

4/23/2026 at 5:16:19 PM

[dead]

by jditu