1/23/2026 at 5:23:40 PM
I dunno, GPT-OSS and Llama and QWEN and any half dozen of other large open-weight models?I really can't imagine OpenAI or Anthropic turning off inference for a model that my workplace is happy to spend >$200*person/month on. Google still has piles of cash and no reason to turn off Gemini.
The thing is, if inference is truly heavily subsidized (I don't think it is, because places like OpenRouter charge less than the big players for proportionally smaller models) then we'd probably happily pay >$500 a month for the current frontier models if everyone gave up on training new models because of some oddball scaling limit.
by benlivengood
1/23/2026 at 5:47:02 PM
Yeah, this is silly. Plenty of companies are hosting their own now, sometimes on prem. This isn't going awayby crimsoneer
1/23/2026 at 5:35:05 PM
> we'd probably happily pay >$500 a month for the current frontier modelsTry $5,000. OpenAI loses hundreds of billions a year, they need a 100x, not 2x.
by iLoveOncall
1/23/2026 at 5:43:38 PM
But they are not losing 100x on inference on high paying customers. Their biggest loss is free user + training/development costby gingersnap
1/23/2026 at 8:16:18 PM
Why lie on a site where people know things.by weirdmantis69
1/23/2026 at 5:47:53 PM
OpenAI loses hundreds of billions a year on inference? I strongly doubt itby filoleg
1/23/2026 at 5:49:48 PM
$60k/yr still seems like a good deal for the productivity multiplier you get on an experienced engineer costing several times that. Actually, I'm fairly certain that some optimizations I had codex do this week would already pay for that from being able to scale down pod resource requirements, and that's just from me telling it to profile our code and find high ROI things to fix, taking only part of my focus away from planned work.Another data point: I gave codex a 2 sentence description (being intentionally vague and actually slightly misleading) of a problem that another engineer spent ~1 week root causing a couple months ago, and it found the bug in 3.5 minutes.
These things were hot garbage right up until the second they weren't. Suddenly, they are immensely useful. That said, I doubt my usage costs anywhere near that much to openai.
by ndriscoll
1/23/2026 at 7:39:48 PM
> $60k/yr still seems like a good deal for the productivity multiplier you get on an experienced engineer costing several times that.Maybe, but that's a hard sell to all the workplaces who won't even spring for >1080p monitors for their experienced engineers.
by Marsymars
1/23/2026 at 6:42:39 PM
Wildly different experience of frontier models than I have, what's your problem domain? I had both Opus and Gemini Pro outright fail at implementing a dead simple floating point image transformation the other day because neither could keep track of when things were floats and when they were uint8.by thot_experiment
1/23/2026 at 7:41:22 PM
Low-level networking in some cloud applications. Using gpt-5.2-codex medium. I've cloned like 25 of our repos on my computer for my team + nearby teams and worked with it for a day or so coming up with an architecture diagram annotated with what services/components live in what repos and how things interact from our team's perspective (so our services + services that directly interact with us). It's great because we ended up with a mermaid diagram that's legible to me, but it's also a great format for it to use. Then I've found it does quite well at being able to look across repos to solve issues. It also made reference docs for all available debug endpoints, metrics, etc. I told it where our prometheus server is, and it knows how to do promql queries on its own. When given a problem, it knows how to run debug commands on different servers via ssh or inspect our kubernetes cluster on its own. I also had it make a shell script to go figure out which servers/pods are involved for a particular client and go check all of their debug endpoints for information (which it can then interpret). Huge time saver for debugging.I'm surprised it can't keep track of float vs uint8. Mine knew to look at things like struct alignment or places where we had slices (Go) on structures that could be arrays (so unnecessary boxing), in addition to things like timer reuse, object pooling/reuse, places where local variables were escaping to heap (and I never even gave it the compiler escape analysis!), etc. After letting it have a go with the profiler for a couple rounds, it eventually concluded that we were dominated by syscalls and crypto related operations, so not much more could be microoptimized.
I've only been using this thing since right before Christmas, and I feel like I'm still at a fraction of what it can do once you start teaching it about the specifics of your workplace's setup. Even that I've started to kind-of automate by just cloning all of our infra teams' repos too. Stuff I have no idea about it can understand just fine. Any time there's something that requires more than a super pedestrian application programmer's knowledge of k8s, I just say "I don't really understand k8s. Go look at our deployment and go look at these guys' terraform repo to see all of what we're doing" and it tells me what I'm trying to figure out.
by ndriscoll
1/23/2026 at 10:36:51 PM
Yeah wild, I don't really know how to bridge the gap here because I've recently been continuously disappointed by AI. Gemni Pro wasn't even able to solve a compiler error the other day, and the solutions it was suggesting were insane (manually migrating the entire codebase) when the solution was like a 0.0.xx compiler version bump. I still like AI a lot for function-scale autocomplete, but I've almost stopped using agents entirely because they're almost universally producing more work for me and making the job less fun, I have to do so much handholding for them to make good architectural decisions and I still feel like I end up on shaky foundations most of the time. I'm mostly working on physics simulation and image processing right now. My suspicion is that there's just so many orders of magnitude more cloud app plumbing code out there that the capability is really unevenly distributed, similarly with my image processing stuff my suspicion is that almost all the code it is trained on works in 8bit and it's just not able to get past it's biases and stop itself from randomly dividing things that are already floats by 255.by thot_experiment