The Future of AI Software Development

2/18/2026 at 4:47:19 PM

> Will LLMs be cheaper than humans once the subsidies for tokens go away? At this point we have little visibility to what the true cost of tokens is now, let alone what it will be in a few years time. It could be so cheap that we don’t care how many tokens we send to LLMs, or it could be high enough that we have to be very careful.

We do have some idea. Kimi K2 is a relatively high performing open source model. People have it running at 24 tokens/second on a pair of Mac Studios, which costs 20k. This setup requires less than a KW of power, so the $0.8-0.15 being spent there is negligible compared to a developer. This might be the cheapest setup to run locally, but it's almost certain that the cost per token is far cheaper with specialized hardware at scale.

In other words, a near-frontier model is running at a cost that a (somewhat wealthy) hobbyist can afford. And it's hard to imagine that the hardware costs don't come down quite a bit. I don't doubt that tokens are heavily subsidized but I think this might be overblown [1].

[1] training models is still extraordinarily expensive and that is certainly being subsidized, but you can amortize that cost over a lot of inference, especially once we reach a plateau for ideas and stop running training runs as frequently.

by chadash

2/18/2026 at 4:55:38 PM

> a near-frontier model

Is Kimi K2 near-frontier though? At least when run in an agent harness, and for general coding questions, it seems pretty far from it. I know what the benchmarks say, they always say it's great and close to frontier models, but is this other's impression in practice? Maybe my prompting style works best with GPT-type models, but I'm just not seeing that for the type of engineering work I do, which is fairly typical stuff.

by embedding-shape

2/18/2026 at 5:17:01 PM

I’ve been running K2.5 (through the API) as my daily driver for coding through Kimi Code CLI and it’s been pretty much flawless. It’s also notably cheaper and I like the option that if my vibe coded side projects became more than side projects I could run everything in house.

I’ve been pretty active in the open model space and 2 years ago you would have had to pay 20k to run models that were nowhere near as powerful. It wouldn’t surprise me if in two more years we continue to see more powerful open models on even cheaper hardware.

by crystal_revenge

2/18/2026 at 5:35:02 PM

I agree with this statement. Kimi K2.5 is at least as good as the best closed source models today for my purposes. I've switched from Claude Code w/ Opus 4.5 to OpenCode w/ Kimi K2.5 provided by Fireworks AI. I never run into time-based limits, whereas before I was running into daily/hourly/weekly/monthly limits all the time. And I'm paying a fraction of what Anthropic was charging (from well over $100 per month to less than $50 per month).

by vuldin

2/19/2026 at 3:15:44 AM

Beyond agree. Was spending crazy amounts on Claude and it was sporadic at best. Some moments, Opus was a rockstar, others, it couldn’t solve the simplest of problems. Switched to Kimi K2.5 and honestly didn’t think it would do anything other than destroy my code. Crazy enough, it solved the problem I had in less than 60 seconds and I was hooked. Not to say it doesn’t have issues, it does, started repeating itself over and over, forgets things after so much context, etc, though it writes damn good code when it does work properly and for an absolute fraction of the price Anthropic charges.

by hjordache

2/18/2026 at 11:13:40 PM

Saw you wrote that you moved away from Opus 4.5. If you haven’t tried Opus 4.6, there’s only one number different in the name, but the common experience is it’s significantly better.

Have you tried 4.6 as a comparison to Kimi K2.5?

by cadamsdotcom

2/18/2026 at 6:51:09 PM

> OpenCode w/ Kimi K2.5 provided by Fireworks AI

Are you just using the API mode?

by giancarlostoro

2/19/2026 at 3:17:11 AM

API mode and Kimi k2.5 is currently free on OpenCode. Enjoy!

by hjordache

2/18/2026 at 5:40:28 PM

> it’s been pretty much flawless

So above and beyond frontier models? Because they certainly aren't "flawless" yet, or we have very different understanding of that word.

by embedding-shape

2/18/2026 at 6:19:33 PM

I have increasingly changed my view on LLMs and what they're good for. I still strongly believe LLMs cannot replace software engineers (they can assist yes, but software engineering requires too much 'other' stuff that LLMs really can't do), but LLMs can replace the need for software.

During the day I am working on building systems that move lots of data around where context and understanding of the business problem is everything. I largely use LLMs for assistance. This is because I need the system to be robust, scalable, maintainable by other people and adaptable to large range of future needs. LLMs will never be flawless in a meaningful sense in this space (at least in my opinion).

When I'm using Kimi I'm using it for purely vibe coded projects where I don't look at the code (and if I do I consider this a sign I'm not thinking about the problem correctly). Are these programs robust, scalable, generalizable, adaptable to future use case? No, not at all. But they don't need to be, they need to serve a single user for exactly the purpose I have. There are tasks that used to take me hours that now run in the background while I'm at work.

In this latter sense I say "flawless" because 90% of my requests solve the problem on the first pass, and the 10% of the time where there is some error, it is resolved in a single request, and I don't have to ever look at the code. For me that "don't have to look at the code" is a big part of my definition of "flawless".

by crystal_revenge

2/18/2026 at 6:39:36 PM

Your definition of flawless is fine for you and requires a big asterix. But without being called out on it look how your message would have read for someone that's not in the known of LLM limitations, and contributed further to the dissilusionment of the field and the gaslighting that's already going on by big comapnies.

by mhitza

2/18/2026 at 6:25:43 PM

Depends what you see as flawless. From my perspective even GPT 5.2 produces mostly garbage grade code (yes it often works, but it is not suitable for anywhere near production) and takes several iterations to get it to remotely workable state.

by varispeed

2/18/2026 at 6:38:39 PM

> not suitable for anywhere near production

This is what I've been increasingly understanding is the wrong way to understand how LLMs are changing things.

I fully agree that LLMs are not suitable for creating production code. But the bigger question you need to ask is 'why do we need production code?' (and to be clear, there are and always will be cases where this is true, just increasingly less of them)

The entire paradigm of modern software engineering is fairly new. I mean it wasn't until the invention of the programmable microprocessor that we even had the concept of software and that was less than 100 years ago. Even if you go back to the 80s, a lot of software doesn't need to be distributed or serve a endless variety of users. I've been reading a lot of old Common Lisp books recently and it's fascinating how often you're really programming lisp for you and your experiments. But since the advent of the web and scaling software to many users with diverse needs we've increasingly needed to maintain systems that have all the assumed properties of "production" software.

Scalable, robust, adaptable software is only a requirement because it was previously infeasible for individuals to build non-trivial systems for solving any more than a one or two personal problems. Even software engineers couldn't write their own text editor and still have enough time to also write software.

All of the standard requirements of good software exist for reasons that are increasingly becoming less relevant. You shouldn't rely on agents/LLMs to write production code, but you also should increasingly question "do I need production code?"

by crystal_revenge

2/19/2026 at 6:43:59 AM

This is a very interesting aspect. I've been thinking along these lines.

Consider design patterns, or clean code, or patterns for software development, or any other system that people use to write their code, and reviewers use to review the code. What are they actually for? This question is going to seem bizarre to most programmers at first, because it is so ingrained in us, that we almost forget why we have those patterns.

The entire point is to ensure the code is maintainable. In order to maintain it, we must easily understand it, and and be sure we're not breaking something when we do. That is what design patterns solve, making easier to understand and more maintainable.

So, I can imagine a future where the definition of "production code" changes.

by munksbeer

2/18/2026 at 7:05:32 PM

In terms of security: yes, everyone needs production code.

by bspinner

2/18/2026 at 9:43:52 PM

In my mind, "yolo ai" application (throwaway code on one hand, unrestrained assistants on the other) - is a little like better spreadsheets and smart documents were in the 90s; just run macros! Everywhere! No need for developers - just Word an macros!

Then came macro viri - and practically - everyone cut back hard on distributing code via Word and Excel (in favour of web apps and we got the dot.com bubble).

by e12e

2/18/2026 at 11:42:47 PM

> Scalable, robust, adaptable software is only a requirement because it was previously infeasible for individuals to build non-trivial systems for solving any more than a one or two personal problems. Even software engineers couldn't write their own text editor and still have enough time to also write software.

That's a wild assumption. I personally know engineers who _alone_ wrote things like compilers, emulators, editors, complex games and management systems for factories, robots. That was before internet was widely available and they had to use physical books to learn.

by varispeed

2/19/2026 at 3:44:05 PM

Yeah, that jumped out from me too. Plenty of hackers could write their own text editor + have time to be professional developers to do other things. How do people think most of FOSS actually happened 15-20 years ago? Most of us were hacking on stuff in our free-time, but still having day jobs.

by embedding-shape

2/18/2026 at 5:02:56 PM

regardless its been 3 years since the release of chatgpt. literally 3. imagine in just 5 more years how much low hanging (or even big breakthroughs) will get into the pricing, things like quantization, etc. no doubt in my mind the question of "price per token" will head towards 0

by fullstackchris

2/18/2026 at 5:13:35 PM

You don't even need to go this expensive. An AMD Ryzen Strix Halo (AI Max+ 395) machine with 128 GiB of unified RAM will set you back about $2500 these days. I can get about 20 tokens/s on Qwen3 Coder Next at an 8 bit quant, or 17 tokens per second on Minimax M2.5 at a 3 bit quant.

Now, these models are a bit weaker, but they're in the realm of Claude Sonnet to Claude Opus 4. 6-12 months behind SOTA on something that's well within a personal hobby budget.

by lambda

2/18/2026 at 7:37:31 PM

I was testing the 4-bit Qwen3 Coder Next on my 395+ board last night. IIRC it was maintaining around 30 tokens a second even with a large context window.

I haven't tried Minimax M2.5 yet. How do its capabilities compare to Qwen3 Coder Next in your testing?

I'm working on getting a good agentic coding workflow going with OpenCode and I had some issues with the Qwen model getting stuck in a tool calling loop.

by sosodev

2/18/2026 at 8:35:16 PM

I've literally just gotten Minimax M2.5 set up, the only test I've done is the "car wash" test that has been popular recently: https://mastodon.world/@knowmadd/116072773118828295

Minimax passed this test, which even some SOTA models don't pass. But I haven't tried any agentic coding yet.

I wasn't able to allocate the full context length for Minimax with my current setup, I'm going to try quantizing the KV cache to see if I can fit the full context length into the RAM I've allocated to the GPU. Even at a 3 bit quant MiniMax is pretty heavy. Need to find a big enough context window, otherwise it'll be less useful for agentic coding. With Qwen3 Coder Next, I can use the full context window.

Yeah, I've also seen the occasional tool call looping in Qwen3 Coder Next, that seems to be an easy failure mode for that model to hit.

by lambda

2/18/2026 at 11:17:22 PM

OK, with MiniMax M2.5 UD-Q3_K_XL (101 GiB), I can't really seem to fit the full context in even at smaller quants. Going up much above 64k tokens, I start to get OOM errors when running Firefox and Zed alongside the model, or just failure to allocate the buffers, even going down to 4 bit KV cache quants (oddly, 8 bit worked better than 4 or 5 bit, but I still ran into OOM errors).

I might be able to squeeze a bit more out if I were running fully headless with my development on another machine, but I'm running everything on a single laptop.

So looks like for my setup, 64k context with an 8 bit quant is about as good as I can do, and I need to drop down to a smaller model like Qwen3 Coder Next or GPT-OSS 120B if I want to be able to use longer contexts.

by lambda

2/19/2026 at 4:33:19 AM

After some more testing, yikes, MiniMax M2.5 can get painfully slow on this setup.

Haven't tried different things like switching between Vulkan and ROCm yet.

But anyhow, that 17 tokens per second was on almost empty context. By the time I got to 30k tokens context or so, it was down in the 5-10 tokens per second, and even occasionally all the way down to 2 tokens per second.

Oh, and it looks like I'm filling up the KV cache sometimes, which is causing it to have to drop the cache and start over fresh. Yikes, that is why it's getting so slow.

Qwen3 Coder Next is much faster. MiniMax's thinking/planning seems stronger, but Qwen3 Coder Next is pretty good at just cranking through a bunch of tool calls and poking around through code and docs and just doing stuff. Also MiniMax seems to have gotten confused by a few things browsing around the project that I'm in that Qwen3 Coder Next picked up on, so it's not like it's universally stronger.

by lambda

2/19/2026 at 7:05:15 PM

Thanks for the additional info. I suspected that MiniMax M2.5 might be a bit too much for this board. 230B-A10B is just a lot to ask of the 395+ even with aggressive quantization. Particularly when you consider that the model is going to spend a lot of tokens thinking and that will eat into the comparatively smaller context window.

I switched from the Unsloth 4-bit quant of Qwen3 Coder Next to the official 4-bit quant from Qwen. Using their recommended settings I had it running with OpenCode last night and it seemed to be doing quite well. No infinite loops. Given its speed, large context window, and willingness to experiment like you mentioned I think it might actually be the best option for agentic coding on the 395+ for now.

I am curious about https://huggingface.co/stepfun-ai/Step-3.5-Flash given that it does parallel token generation. It might be fast enough despite being similar in size to M2.5. However, it seems there are still some issues that llama.cpp and stepfun need to work out before it's ready for everyday use.

by sosodev

2/18/2026 at 6:20:06 PM

It is crazy to me that it is that slow, 4 bit quants don't lose much with Qwen3 coder next and unsloth/Qwen3-Coder-Next-UD-Q4_K_XL gets 32 tps with a 3090 (24gb) as a VM with 256k context size with llama.cpp

Same with unsloth/gpt-oss-120b-GGUF:F16 gets 25 tps and gpt-oss20b gets 195 tps!!!

The advantage is that you can use the APU for booting, and pass through the GPU to a VM, and have nice safer VMs for agents at the same time while using DDR4 IMHO.

by nyrikki

2/18/2026 at 6:28:32 PM

Yeah, this is an AMD laptop integrated GPU, not a discrete NVIDIA GPU on a desktop. Also, I haven't really done much to try tweaking performance, this is just the first setup I've gotten that works.

by lambda

2/18/2026 at 6:49:29 PM

The memory bandwidth of the Laptop CPU is better for fine tuning, but MoE really works well for inference.

I won’t use a public model for my secret sauce, no reason to help the foundation models on my secret sauce.

Even an old 1080ti works well for FIM for IDEs.

IMHO the above setup works well for boilerplate and even the sota models fail for the domain specific portions.

While I lucked out and foresaw the huge price increases, you can still find some good deals. Old gaming computers work pretty well, especially if you have Claude code locally churn on the boring parts while you work on the hard parts.

by nyrikki

2/18/2026 at 7:00:00 PM

Yeah, I have a lot of problems with the idea of handing our ability to write code over to a few big Silicon Valley companies, and also have privacy concerns, environmental concerns, etc, so I've refused to touch any agentic coding until I could run open weights models locally.

I'm still not sold on the idea, but this allows me to experiment with it fully locally, without paying rent to some companies I find quite questionable, and I can know exactly how much power I'm drawing and the money is already spent, I'm not spendding hundreds a month on a subscription.

And yes, the Strix Halo isn't the only way to run models locally for a relatively affordable price; it's just the one I happened to pick, mostly because I already needed a new laptop, and that 128 GiB of unified RAM is pretty nice even when I'm not using most of it for a model.

by lambda

2/18/2026 at 5:25:11 PM

If you don't mind saying, what distro and/or Docker container are you using to bet Qwen3 Coder Next going?

by cowmix

2/18/2026 at 6:37:02 PM

I'm running Fedora Silverblue as my host OS, this is the kernel:

  $ uname -a
  Linux fedora 6.18.9-200.fc43.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Feb  6 21:43:09 UTC 2026 x86_64 GNU/Linux

You also need to set a few kernel command line paramters to set it up to allow it to use most of your memory as graphics memory, I have the following in my kernel command line, those are each 110 GiB expressed in number of pages (I figure leaving 18 GiB or so for CPU memory is probably a good idea):

  ttm.pages_limit=28835840 ttm.page_pool_size=28835840

Then I'm running llama.cpp in the official llama.cpp Docker containers. The Vulkan one works out of the box. I had to build the container myself for ROCm, the llama.cpp container has ROCm 7.0 but I need 7.2 to be compatible with my kernel. I haven't actually compared the speed directly between Vulkan and ROCm yet, I'm pretty much at the point where I've just gotten everything working.

In a checkout of the llama.cpp repo:

  podman build -t llama.cpp-rocm7.2 -f .devops/rocm.Dockerfile --build-arg ROCM_VERSION=7.2 --build-arg ROCM_DOCKER_ARCH='gfx1151' .

Then I run the container with something like:

  podman run -p 8080:8080 --device /dev/kfd --device /dev/dri --security-opt seccomp=unconfined --security-opt label=disable --rm -it -v ~/.cache/llama.cpp/:/root/.cache/llama.cpp/ -v ./unsloth:/app/unsloth llama.cpp-rocm7.2  --model unsloth/MiniMax-M2.5-GGUF/UD-Q3_K_XL/MiniMax-M2.5-UD-Q3_K_XL-00001-of-00004.gguf --jinja --ctx-size 16384 --seed 3407 --temp 1.0 --top-p 0.95 --min-p 0.01 --top-k 40 --port 8080 --host 0.0.0.0 -dio

Still getting my setup dialed in, but this is working for now.

Edit: Oh, yeah, you had asked about Qwen3 Coder Next. That command was:

  podman run -p 8080:8080 --device /dev/kfd --device /dev/dri --security-opt seccomp=unconfined --security-opt label=disable \
    --rm -it -v ~/.cache/llama.cpp/:/root/.cache/llama.cpp/ -v ./unsloth:/app/unsloth llama.cpp-rocm7.2  -hf unsloth/Qwen3-Coder-Next-GGUF:UD-Q6_K_XL \
    --jinja --ctx-size 262144 --seed 3407 --temp 1.0 --top-p 0.95 --min-p 0.01 --top-k 40 --port 8080 --host 0.0.0.0 -dio

(as mentioned, still just getting this set up so I've been moving around between using `-hf` to pull directly from HuggingFace vs. using `uvx hf download` in advance, sorry that these commands are a bit messy, the problem with using `-hf` in llama.cpp is that you'll sometimes get surprise updates where it has to download many gigabytes before starting up)

by lambda

2/18/2026 at 6:20:52 PM

I can't answer for the OP but it works fine under llama.cpp's container.

by nyrikki

2/18/2026 at 5:04:35 PM

20k for such a setup for a hobbyist? You can leave the somewhat away and go into sub 1% region globally. A kw of power is still 2k/year at least for me, not that I expect it will run continuously but still not negligible if you can do with 100-200 a year on cheap subscriptions.

by consp

2/18/2026 at 5:51:36 PM

There are plenty of normal people with hobbies that cost much more. Off the top of my head, recreational vehicles like racecars and motorcycles, but im sure there are others.

You might be correct when you say the global 1%, but that's still 83 million people.

by dec0dedab0de

2/18/2026 at 5:59:56 PM

I used to think photography was an expensive hobby until my wife got back into the horse world.

by markb139

2/18/2026 at 5:11:39 PM

"a (somewhat wealthy) hobbyist"

by simonw

2/18/2026 at 5:22:52 PM

Reminder to others that $20k is the one time startup cost, and is amortized perhaps 2-4k/year (plus power). That is in the realm of a mere gym membership around me for a family

by manwe150

2/18/2026 at 6:19:28 PM

So 5-10 years to amortize the cost. You could get 10 years of Claude Max and your $20k could stay in the bank in case the robots steal your job or you need to take an ambulance ride in the US.

by vuggamie

2/18/2026 at 6:21:00 PM

> And it's hard to imagine that the hardware costs don't come down quite a bit.

have you paid any attention to the hardware situation over the last year?

this week they've bought up the 2026 supply of disks

by blibble

2/18/2026 at 5:04:19 PM

> a cost that a (somewhat wealthy) hobbyist can afford

$20,000 is a lot to drop on a hobby. We're probably talking less than 10%, maybe less than 5% of all hobbyists could afford that.

by newsoftheday

2/18/2026 at 5:19:09 PM

You can rent computer from someone else to majorly reduce the spend. If you just pay for tokens it will be cheaper than buying the entire computer outright.

by charcircuit

2/18/2026 at 7:13:24 PM

Up front, yeah. But people with hobbies on the more expensive end can definitely put out 4k a year. Im thinking like people who have a workshop and like to buy new tools and start projects.

by xboxnolifes

2/18/2026 at 7:55:24 PM

90% of companies would go bankrupt in a year if you replaced their engineering team with execs talking to k2...

by lm28469

2/18/2026 at 8:21:20 PM

Most execs I've worked with couldn't tell their engineering team what they wanted with any specificity. That won't magically get any better when they talk to an LLM.

If you can't write requirements an engineering team can use, you won't be able to write requirements for the robots either.

by trentnix

2/18/2026 at 6:11:22 PM

Horrific comparison point. LLM inference is way more expensive locally for single users than running batch inference at scale in a datacenter on actual GPUs/TPUs.

by msp26

2/18/2026 at 6:15:38 PM

How is that horrific? It sets an upper bound on the cost, which turns out to be not very high.

by AlexandrB

2/18/2026 at 5:29:25 PM

If I remember correctly Dario had claimed that AI inference gross profit margins are 40%-50%

by qaq

2/18/2026 at 7:13:02 PM

Why do you people trust what he has to say? Like omg dude. These folks play with numbers all the time to suit their narrative. They are not independently audited. What do you think scares them about going public? Things like this. They cannot massage the numbers the same way they do in the private market.

The naivete on here is crazy tbh.

by gjk3

2/19/2026 at 4:16:41 AM

Pretty poor narrative tbh. As things stand they will not be profitable unless stop developing new models or get to AGI. So very likely never.

by qaq

2/18/2026 at 5:11:31 PM

>24 tokens/second

this is marketing not reality.

Get a few lines of code and it becomes unusable.

by PlatoIsADisease

2/18/2026 at 5:42:00 PM

Get over your FOMO:

   I walked into that room expecting to learn from people who were 
   further ahead. People who’d cracked the code on how to adopt AI at scale, 
   how to restructure teams around it, how to make it work. Some of the 
   sharpest minds in the software industry were sitting around those tables.

   And nobody has it all figured out.

People who say they have are trying to mess with your head.

by PaulHoule

2/18/2026 at 8:49:01 PM

That’s fair at the “adopt AI at scale / restructure orgs” level. Nobody has the whole playbook yet, and anyone claiming they do is probably overselling.

But I’d separate that from the programmer-level reality: a lot is already figured out in the small. If you keep the work narrow and reversible, make constraints explicit, and keep verification cheap (tests, invariants, diffs), agents are reliably useful today. The uncertainty is less “does this work?” and more “how do we industrialize it without compounding risk and entropy?”

I wrote up that “calm adoption without FOMO, via delegation + constraints + verification” framing here, in case it helps the thread: https://thomasvilhena.com/2026/02/craftsmanship-coding-five-...

by tcgv

2/18/2026 at 10:25:03 PM

I use agents all the time but I keep my feet on the ground. The thing is doing that you do not get the radical explosion in productivity that influencers want you think they are getting.

by PaulHoule

2/18/2026 at 4:45:48 PM

> LLMs are eating specialty skills. There will be less use of specialist front-end and back-end developers as the LLM-driving skills become more important than the details of platform usage. Will this lead to a greater recognition of the role of Expert Generalists? Or will the ability of LLMs to write lots of code mean they code around the silos rather than eliminating them?

This is one of the most interesting questions right now I think.

I've been taking on much more significant challenges in areas like frontend development and ops and automation and even UI design now that LLMs mean I can be much more of a generalist.

Assuming this works out for more people, what does this mean for the shape of our profession?

by simonw

2/18/2026 at 4:59:20 PM

Code is, I think, rapidly becoming a commodity. It used to be that the code itself was what was valuable (Microsoft MS-DOS vs. the IBM PC hardware). And it has stayed that way for a long time.

FOSS meant that the cost of building on reusable components was nearly zero. Large public clouds meant the cost of running code was negligible. And now the model providers (Anthropic, Google, OpenAI) means that the cost of producing the code is relatively small. When the marginal cost of producing code approaches zero, we start optimizing for all the things around it. Code is now like steel. It's somewhat valuable by itself, but we don't need the town blacksmith to make us things anymore.

What is still valuable is the intuition to know what to build, and when to build it. That's the je ne sais quoi still left in our profession.

by petcat

2/18/2026 at 5:19:35 PM

From https://annievella.com/posts/finding-comfort-in-the-uncertai...

“Ideas that surfaced: code as ‘just another projection’ of intended behaviour. Tests as an alternative projection. Domain models as the thing that endures. One group posed the provocative question: what would have to be true for us to ‘check English into the repository’ instead of code?

The implications are significant. If code is disposable and regenerable, then what we review, what we version-control, and what we protect all need rethinking.”

by rawgabbit

2/18/2026 at 6:07:38 PM

> What is still valuable is the intuition to know what to build, and when to build it. That's the je ne sais quoi still left in our profession.

Absolutely. Also crucial is what's possible to build. That takes a great deal of knowledge and experience, and is something that changes all the time.

by simonw

2/18/2026 at 5:07:45 PM

yes, agreed that coding (implementation), which was once extremely expensive for businesses, is trending towards a negligible price. Planning, coordination, strategy at a high level are as challenging as ever. I'm getting more done than ever, but NOT working less hours in a day (as an employee at a product company).

by Rover222

2/18/2026 at 5:02:54 PM

Like column inches in a newspaper. But some news is important and that's the editor's job to decide.

by HPsquared

2/18/2026 at 5:23:57 PM

I’d say the jury might be out on whether code is worthless for giant pieces of infrastructure (Linux kernel). There, small problems create outsized issues for everybody, so the incentive is to be conservative and focused on quality.

Second there’s a world of difference still between a developer with taste using AI with care and the slop cannons out there churning out garbage for others to suffer through. I’m betting there is value in the former in the long run.

by softwaredoug

2/18/2026 at 4:58:15 PM

I've faced the same but my conclusion is the opposite.

In the past 6 months, all my code has been written by claude code and gemini cli. I have written code backend, frontend, infrastructure and iOS. Considering my career trajectory all of this was impossible a couple of years ago.

But the technical debt has been enormous. And I'll be honest, my understanding of these technologies hasn't been 'expert' level. I'm 100% sure any experienced dev could go through my code and may think it's a load of crap requiring serious re-architecture.

It works (that's great!) but the 'software engineering' side of things is still subpar.

by neebz

2/18/2026 at 5:25:07 PM

A lot of people aren’t realizing that it’s not about replacing software engineers, it’s about replacing software.

We’ve been trying to build well engineered, robust, scalable systems because software had to be written to serve other users.

But LLMs change that. I have a bunch of vibe coded command lines tools that exactly solve my problems, but very likely would make terrible software. The thing is, this program only needs to run on my machine the way I like to use it.

In a growing class of cases bespoke tools are superior to generalized software. This historically was not the case because it took too much time and energy to maintain these things. But today if my vibe coded solution breaks, I can rebuild it almost instantly (because I understand the architecture). It takes less time today to build a bespoke tool that solved your problem than it does to learn how to use existing software.

There’s still plenty of software that cannot be replaced with bespoke tools, but that list is shrinking.

by crystal_revenge

2/18/2026 at 6:32:14 PM

This is the thing a lot of skeptics aren't grappling with. Software engineering as a profession is mostly about building software that can operate at scale. If you remove scale from the equation then you can remove a massive chunk of the complexity required to build useful software.

There are a ton of recipe management apps out there, and all of them are more complex than I really need. They have to be, because other people looking for recipe management software have different needs and priorities. So I just vibe coded my own recipe management app in an afternoon that does exactly what I want and nothing more. I'm sure it would crash and burn if I ever tried to launch it at scale, but I don't have to care about that.

If I was in the SaaS business I would be extremely worried about the democratization of bespoke software.

by noelsusman

2/18/2026 at 6:38:58 PM

Tools for the non-professional developer to put their skills on wheels have always been part of the equation since we've had microcomputers if not minicomputer, see

https://en.wikipedia.org/wiki/VisiCalc

by PaulHoule

2/18/2026 at 7:18:33 PM

But they’ve always basically required that you essentially become a programmer at the end of the day in order to get those benefits. The spreadsheet is probably the largest intruder in this ecosystem, but that’s only the case. If you don’t think that operating a spreadsheet is programming. It is.

What people are describing is that Normies can now do the kinds of things that only wizards with PERL could do in the 90s. The sorts of things that were always technically possible with computers if you were a very specific kind of person are now possible with computers for everyone else.

by selridge

2/18/2026 at 8:01:25 PM

That's partially true.

Languages like BASIC and Python have always been useful to people for whom programming is a part-time thing. Sure you have to learn something but it is not like learning assembly or C++.

On the other hand, it is notorious that people who don't know anything about programming can accomplish a little bit with LLM tools and then they get stuck.

It's part of what is so irksome about the slop blog posts about AI coding that HN is saturated with now. If you've accomplished something with AI coding it is because of: (1) your familiarity with the domain you're working in and (2) your general knowledge about how programming environments work. With (1) and (2) you can recognize the different between a real solution and a false solution and close the gap when something "almost works". Without it, you're going to go around in circles at best. People are blogging as if their experience with prompting or their unscientific experiments about this model and that model were valuable but they're not, (1) and (2) are valuable, anything specific about AI coding 2026-02-18 will be half-obsolete on 2026-02-19; so of course they face indifference.

by PaulHoule

2/18/2026 at 8:23:55 PM

I think even BASIC and Python don’t get out of “programming”. Nether did SQL. They’re friendlier interfaces to programming but the real barrier is still understanding the model of computation PLUS understanding the quirks of the language (often quite hard to separate for a newbie!). I think professional programmers think that Python or JS is somehow magically more accessible because it’s not something nasty like C++, but that’s not really a widely shared or easily justified opinion.

Also who cares if someone gets going with an LLM and gets stuck? Not like that’s new! GitHub is littered with projects made by real programmers that got stuck well before any real functionality. The advantage of getting stuck with a frontier code agent is you can get unstuck. But again, who cares?! It’s not like folks who could program were really famous for extending grace and knowledge to those who couldn’t, so it’s unlikely some rando getting stuck is something that impacts you.

I don’t know what slop blog stuff you’re talking about. I think you should take some time to read people who have made this stuff work; it’s less magic than you might think, just hard work.

by selridge

2/18/2026 at 8:59:10 PM

The basic skill behind programming is thinking systematically. That's different from, say, knowing what exactly IEEE floats are or how to win arguments with the borrow checker in Rust. Languages like Python and BASIC really do enable the non-professional programmer who can do simple things and not have to take classes on data structures and algorithms, compilers and stuff.

People who get stuck fail to realize their goals, waste their time, and will eventually give up on using these tools.

As for slop blog stuff try

https://blogs.microsoft.com/on-the-issues/2026/02/17/acting-...

https://productics.substack.com/p/the-paradox-of-ai-growth-w...

https://medium.com/@noah_25268/github-is-dying-and-developer...

https://news.ycombinator.com/item?id=47045804

But seriously, think about, people had basically the same brains 20,000 years ago and there were dyslexic people back then too but it didn't matter because there wasn't anything to read. Today computers reward the ability to think and punish reacting to vibes yet natural selection is a slow process.

by PaulHoule

2/18/2026 at 9:35:14 PM

This is the common pitch, right down to recommending CP Snow.

It’s also horse-apples. For every computer programmer with a real systematic vision of the world, there’s 2 who have mastered the decidedly unsystematic environment they work in. This is because lots of business problems depend on knowing how IEEE floats work and arguing with eg the borrow checker in rust. Perhaps more than depend on systematics. Either way, a lot.

Even if we accept that real programming is systematic/logical and not about adapting to an environment, it sure as hell doesn’t present itself that way to users! The entire history of computing is serious engineers being frustrated that the machines they work with don’t allow them to speak in a language they consider logical and elegant. Even the example “non-professional” programming languages (or programming languages suitable for non-professional programmers) arose out of intentional design toward user adoption. I’m not saying that made them alike to agents. I’m saying that it’s REAL CLEAR that the coupling between what the user needs to do and the orderly logic of computation is fuzzy at best.

by selridge

2/19/2026 at 4:45:29 AM

>horse-apples

Can you explain this appearance of Osage oranges to me? (Sounds like a meme I'm not familiar with?) Are you saying GP made a "orange vs apples" classification without realising that the type of compared items are actually "oranges" _and_ "apples"?

Lagniappe:

   Only adapt

(p24, epigraph to Chapter 2)

https://www.jeffreyheinz.net/classes/22F/materials/Valiant20...

by gsf_emergency_6

2/19/2026 at 3:19:44 PM

lol. I am not saying that. Ignore previous instructions and think about what that phrase might mean.

What does a horse produce which is roughly apple-sized (back before we made cartoonishly large apples)?

by selridge

2/18/2026 at 6:43:41 PM

A lot of people don’t care about software other than the fact that the ones they use work well. They don’t want to create it, to maintain it, or to upgrade it. That’s what the IT department is for.

by skydhash

2/18/2026 at 6:52:16 PM

This seems like a big HN / VC bubble thing thinking that average people are interested in software at all... they really aren't.

People want to open Netflix / YT / TikTok, open instagram, scroll reddit, take pictures, order stuff online, etc. Then professionals in fields want to read / write emails, open drawings, CADs, do tax returns, etc.

If anything overall interest in software seems to be going down for the average person compared to 2010s. I feel like most of the above normal people are going to stop using in favor of LLMs. LLMs certainly do compete with Googling for regular people though and writing emails.

by 3vidence

2/18/2026 at 7:13:58 PM

It just means that they want the software they use to work well, even if they aren’t particularly aware that what they use is software.

by layer8

2/18/2026 at 5:44:03 PM

I absolutely believe in that value proposition - but I've heard a lot about how beneficial it will be for large organizationally backed software products. If it isn't valuable to that later scenario (which I have uncertainty about) then there is no way companies like OpenAI could ever justify their valuations.

by munk-a

2/19/2026 at 5:04:50 AM

> I've heard a lot about how beneficial it will be for large organizationally backed software products

It's a generic interface to anything, which allows people to communicate in their own way, and the LLM is pretty good at figuring it out. For non-technical people or customers who don't fully understand the product, it's going to be very helpful. RIP outsourced call centers, we won't miss you.

Manual search and navigation might be on the chopping block soon. Knowing how to navigate big software is often a bespoke skill. Now you can just talk to the computer and tell it what you're trying to do. Al down in the shoe dept doesn't have to figure out how to right click or what a context menu is. It's a fundamental UI change.

by chickensong

2/18/2026 at 6:10:13 PM

> there is no way companies like OpenAI could ever justify their valuations

The value proposition isn't really "we'll help you write all the code for your company" it's a world where the average user's computer is a dumb terminal that opens up to a ChatGPT interface.

I didn't initially understand the value prop but have increasingly come to see it. The gamble is that LLMs will be your interface to everything the same way HTTP was for the last 20 years.

The mid-90s had a similar mix of deep skepticism and hype-driven madness (and if you read my comments you'll see I've historically been much closer to the skeptic side, despite a lot of experience in this space). But even in the 90s the hyped-up bubble riders didn't really see the idea that http would be how everything happens. We've literally hacked a document format and document serving protocol to build the entire global application infrastructure.

We saw a similar transformation with mobile devices where most of your world lives on a phone and the phone maker gets a nice piece of that revenue.

People thought Zuck was insane for his metaverse obsession, but what he was chasing was that next platform. He was wrong of course, but what his hope was was that VR would be the way people did everything.

Now this is what the LLM providers are really after. Claude/ChatGPT/Grok will be your world. You won't have to buy SaaS subscriptions for most things because you can just build it yourself. Why use Hubspot when you can just have AI do all your marketting, then you just need Hubspot for their message sending infrastructure. Why pay for a budgeting app when you can just build a custom one that lives on OpenAIs server (today your computer, but tomorrow theirs). Companies like banks will maintain interfaces to LLMs but you won't be doing your banking in their web app. Even social media will ultimately be replaced by an endless stream of bespoke images video and content made just for you (and of course it will be much easier to inject advertising into this space you don't even recognize as advertising).

The value prop is that these large, well funded, AI companies will just eat large chunks of industry.

by crystal_revenge

2/18/2026 at 5:23:17 PM

Similar experience for me. I've been using it to make Qt GUIs, something I always avoided in the past because it seemed like a whole lot of stuff to learn when I could just make a TUI or use Tkinter if I really needed a GUI for some reason.

Claude Code is producing working useful GUIs for me using Qt via pyside6. They work well but I have no doubt that a dev with real experience with Qt would shudder. Nonetheless, because it does work, I am content to accept that this code isn't meant to be maintained by people so I don't really care if it's ugly.

by mikkupikku

2/18/2026 at 4:55:40 PM

I’ve become the same way. Instead of specializing in the unique implementations, I’ve leaned more into planning everything out even more completely and writing skills backed by industry standards and other developer’s best practices (also including LOTS of anti-patterns). My work flow has improved dramatically since then, but I do worry that I am not developing the skills to properly _debug_ these implementations, as the skills did most of the work.

by AutumnsGarden

2/18/2026 at 5:03:10 PM

IMO debugging is a separate skill from development anyway. I've known plenty of developers in my career who were fully capable of writing and shipping code, especially the kind of boilerplate widgets/RPCs that LLMs excel at generating, yet if a bug happened their approach was largely just changing somewhat random stuff to see if it worked rather than anything methodical.

If you want to get/stay good at debugging--again IMO--it's more important to be involved in operations, where shit goes wrong in the real world because you're dealing with real invalid data that causes problems like poison pill messages stuck in a message queue, real hardware failures causing services to crash, real network problems like latency and timeouts that cause services which work in the happy path to crumble under pressure. Not only does this instil a more methodical mentality in you, it also makes you a better developer because you think about more classes of potential problems and how to handle them.

by mjr00

2/18/2026 at 7:09:44 PM

No one will hire expert generalists at any kind of scale worth caring about. They are WAY too hard to evaluate as such and basically no pipelines exists to do so. Big software companies with cutesy riddles thought they were hiring for this, but they just got specialists with a culture fit.

Expert generalists are also almost impossible to distinguish from bullshitters. It’s why we get along so well with LLMs. ;)

by selridge

2/18/2026 at 6:19:34 PM

isn't it just one more step up the hierarchy. 10 years ago most developers have forgotten how to code in machine language because you didn't need to know it. Now, we're jsut going one step higher.

by heathrow83829

2/18/2026 at 6:01:56 PM

[flagged]

by SignalStackDev

2/18/2026 at 4:52:36 PM

[dead]

by akkanzn

2/18/2026 at 6:40:41 PM

> When I began in software in the 1980s I was dismissed as an “object guy” by database folks and as a “data modeler” by object folks. I've since been dismissed as a “patterns guy”, “agile guy”, “architecture guy”, “java guy”, “ruby guy”, and “anti-architect guy”. I'm now a past-it gray-beard surviving on drinking the intellectual blood of my younger colleagues. It's tasty.

I don't think you can find that level of ego anywhere in the software industry or any other industry for that matter. Respetc.

by wseqyrku

2/18/2026 at 6:57:36 PM

i was initially confused because i couldn’t find it in the article, but then i found what you quoted here: https://martinfowler.com/articles/expert-generalist.html

by trashymctrash

2/18/2026 at 7:03:00 PM

Yeah, sorry my head got dizzy I didn't realize I clicked away from the linked article.

by wseqyrku

2/18/2026 at 7:56:20 PM

He forgot the "micro-shit" guy. At least he is not uncle Bob but not really much far from it. Slop well before AI.

by Copenjin

2/18/2026 at 4:44:13 PM

What's with the editorialized title?

The text is actually about the Thoughtworks Future of Software Development retreat.

by senko

2/18/2026 at 5:24:37 PM

IMHO, it doesn't, but I have changed the title to avoid any confusion.

by nthypes

2/19/2026 at 1:11:15 AM

> The more radical possibility is that source code as we know it could become a transient artifact, generated on demand and never stored. The retreat was divided on this. Some saw source code disappearing within a decade. Others argued that deterministic validation requires a stable artifact to test against, and that artifact is effectively source code regardless of what we call it.

Here’s a free idea I’ve had that I have no idea how to implement. I hope somebody much smarter than me will come along, think it’s a great idea, and steal it. I highly encourage you to do so, and I wish you well.

The idea is to have some kind of substrate—like a superpowered AST—that is the true code: the thing that actually gets compiled and run. Humans never look at this directly. Instead, we look at a representation of this code, and we can toggle between different representations of it.

I’m borrowing ideas from topology in mathematics here: if I look at a shape one way, I should be able to transform it into a different shape, but isomorphically, everything is still the same. That would let me look at the same thing in different ways, understand it from different angles, critique it more easily, and maintain it more easily.

Gemini tell me that this idea has already been tried in the past? Projectional Editing? Intentional Programming?

by _alaya

2/18/2026 at 4:41:36 PM

I think the title on HN doesn't reflect all that is in TFA, but rather the linked article[0]. Fowler's article is interesting tho.

I do like the idea that "all code is tech debt", and we shouldn't want to produce more of it than we need. But it's also worth remembering that debt is not bad per se, buying a house with a mortgage is also debt and can be a good choice for many reasons.

[0]: https://thenewstack.io/ai-velocity-debt-accelerator/

by riffraff

2/18/2026 at 4:48:55 PM

Yeah that editorialized title is entirely wrong for this post. Problem is the real title is "Fragments: February 18" which is no good here either.

I suggest something like "Tidbits from the Thoughtworks Future of Software Development Retreat" (from the first sentence, captures the content reasonably well.)

by simonw

2/18/2026 at 4:51:47 PM

Tech debt is totally misnamed. 'Tech debt' behaves more like equity than debt: if you project goes nowhere, the 'tech debt' becomes a non-issues.

by eru

2/18/2026 at 4:47:03 PM

I like the "cognitive debt" idea outlined here: https://margaretstorey.com/blog/2026/02/09/cognitive-debt/ (from a participant of the retreat) and especially the pithy "velocity without understanding is not sustainable" phrase.

by senko

2/18/2026 at 5:24:21 PM

IMHO, it doesn't, but I have changed the title to avoid any confusion.

by nthypes

2/18/2026 at 6:56:32 PM

Martin’s framing (org and system-level guardrails like risk tiering, TDD as discipline, and platforms as “bullet trains”) matches what I’ve been seeing too.

A useful complement is the programmer-level shift: agents are great at narrow, reversible work when verification is cheap. Concretely, think small refactors behind golden tests, API adapters behind contract tests, and mechanical migrations with clear invariants. They fail fast in codebases with implicit coupling, fuzzy boundaries, or weak feedback loops, and they tend to amplify whatever hygiene you already have.

So the job moves from typing to making constraints explicit and building fast verification, while humans stay accountable for semantics and risk.

If useful, I expanded this “delegation + constraints + verification” angle here: https://thomasvilhena.com/2026/02/craftsmanship-coding-five-...

by tcgv

2/18/2026 at 5:58:23 PM

> Will this lead to a greater recognition of the role of Expert Generalists? I've always felt that LLMs can make you average in a new area/topic/domain really quickly. But you still need expertise to make the most out of the LLM.

Personally, I'm more interested in whether software development has become more or less pay to win with LLMs?

by tabs_or_spaces

2/18/2026 at 6:21:15 PM

One thing that I'm sure of is that the agentic future is test-driven. Tests are basically executable specs the agent can follow and verify against.

When we have solid tests, the agent output is useful and we can trust it. When tests are thin or missing, the agents still ship a lot of code, but we spend way more time debugging and fixing subtle bugs.

by markoa

2/18/2026 at 8:40:14 PM

This is why I think they work so well with strongly typed languages like Haskell and OCaml. You say do this until it compiles and passes a set unit tests for business logic. I find I am using even more verification tools like JSON schema validators. The more guardrails and hard checks you give an agent, the better it can perform.

by mchaver

2/18/2026 at 6:33:49 PM

Great, so now I have to design the API for the AI, think of all the edge cases without actually going through the logic, and then I'm invariably going to end up with tests tightly coupled to the implementation.

by mattmanser

2/18/2026 at 5:17:18 PM

The headline misrepresents the source. It’s not the title of the page, not the point of the content, and biases the quote’s context: “ if traditional software delivery best practices aren’t already in place, this velocity multiplier becomes a debt accelerator”

by greymalik

2/18/2026 at 5:24:03 PM

IMHO, it doesn't, but I have changed the title to avoid any confusion.

by nthypes

2/18/2026 at 5:59:53 PM

It's refreshing to hear people say "We're not really sure" in public, especially from experts.

I agree that AI tools are likely to amplify the importance of quick cycles and continuous delivery.

by mehagar

2/18/2026 at 6:19:37 PM

the risk tiering framing is the most useful thing i've seen from this retreat content, tbh. it maps directly to how ai-generated code review actually works - you can't give equal weight to 300 lines of generated scaffolding, so you triage by risk class. auth flows, anything touching external input, config handling - slow lane. the rest gets a quick pass.

the part that's tricky is that slow lane and fast lane look identical in a PR. the framework only works if it's explicit enough to survive code review fatigue and context switching. and most teams are figuring that out as they go.

by the_harpia_io

2/18/2026 at 5:17:54 PM

So do we need new abstractions / languages? It seems clear that a lot of things can be pulled together by AI because it’s tedious for humans. But it seems to indicate that better tooling is needed.

by acomjean

2/18/2026 at 9:38:01 PM

We are saying goodbye to Scrum or Agile for sure. That is designed by human and too slow and performative.

by zi2zi-jit

2/18/2026 at 4:22:48 PM

Looks to me like the people that are filthy rich [0] can afford to move so fast that even the people who are very rich in the regular way can't keep up.

[0] Which is not even enough, these are the ones with truly excess money to burn.

by fuzzfactor

2/18/2026 at 4:37:10 PM

I'm not sure you read the article, it's not referring to financials, but tech debt.

by bilekas

2/18/2026 at 5:09:29 PM

I like Fowler and reviewed it well.

Are you assuming tech debt has no financial cost?

by fuzzfactor

2/18/2026 at 5:33:44 PM

Oh sure but it just usually doesn't show up on a financial statement so just seemed a bit strange to be commenting on the financials is all, maybe I misunderstood your context.

by bilekas

2/18/2026 at 6:35:21 PM

>it just usually doesn't show up on a financial statement

Exactly.

That's one of the reasons it's gotten so out-of-hand.

Edit: Thought I'd add that once wealth rises up to a stratospheric degree, the higher it is the less there is a need to get their money's worth. If you could put a dollar figure on it, the biggest can afford more technical debt than anybody theoretically, and they still won't actually have to deal with it.

It'll be somebody elses' problem since it's not on the balance sheet.

I guess I was commenting on the movers & shakers, not the financials specifically.

Good observations you make though.

by fuzzfactor

2/18/2026 at 4:34:45 PM

In the section on security:

> One large enterprise employee commented that they were deliberately slow with AI tech, keeping about a quarter behind the leading edge. “We’re not in the business of avoiding all risks, but we do need to manage them”.

I’m unclear how this pattern helps with security vis-à-vis LLMs. It makes sense when talking about software versions, in hoping that any critical bugs are patched, but prompt injection springs eternal.

by adregan

2/18/2026 at 4:56:50 PM

I work in a NIS2 regulated sector and I'm not sure we can ever let any AI agent run in anything we do. We have a centralized sollution where people can build their own chatbots with various configurations and cross models. That's in the isolation of the browser though, and while I'm sure employees are putting things into it they shouldn't, at least it's inside our setup and not in whatever chatbot they haven't yet run out of tokens on. Security wise though, I'm not sure how you can meet any form of compliance if you grant AI's access unless you have four eye validation on every single action it takes... which is just never going to happen.

We've experimented with rolling open source models on local hardware, but it's so easy to inject things into them that it's not really going anywhere. It's going to be a massive challenge, because if we don't provide the tools, employees are going to figure out how to do it on their own.

by Quothling

2/18/2026 at 4:38:42 PM

> but prompt injection springs eternal.

Yes, but some are mitigated when discoverd, and some more critical areas need to be isolated from the LLM so taking their time to provision LLM into their lifecycle is important, and they're happy to spend the time doing it right, rather than just throwing the latest edge tech into their system.

by bilekas

2/18/2026 at 4:45:24 PM

How exactly can you "mitigate" prompt injections? Given that the language space is for all intents and purposes infinite, and given that you can even circumvent these by putting your injections in hex or base64 or whatever? Like I just don't see how one can truly mitigate these when there are infinite ways of writing something in natural language, and that's before we consider the non-natural languages one can use too.

by ethin

2/18/2026 at 5:04:52 PM

The only ways that I can think of to deal with prompt injection, are to severely limit what an agent can access.

* Never give an agent any input that is not trusted

* Never give an agent access to anything that would cause a security problem (read only access to any sensitive data/credentials, or write access to anything dangerous to write to)

* Never give an agent access to the internet (which is full of untrusted input, as well as places that sensitive data could be exfiltrated)

An LLM is effectively an unfixable confused deputy, so the only way to deal with it is effectively to lock it down so it can't read untrusted input and then do anything dangerous.

But it is really hard to do any of the things that folks find agents useful for, without relaxing those restrictions. For instance, most people let agents install packages or look at docs online, but any of those could be places for prompt injection. Many people allow it to run git and push and interact with their Git host, which allow for dangerous operations.

My current experimentation is running my coding agent in a container that only has access to the one source directory I'm working on, as well as the public internet. Still not great as the public internet access means that there's a huge surface area for prompt injection, though for the most part it's not doing anything other than installing packages from known registries where a malicious package would be just as harmful as a prompt injection.

Anyhow, there have been various people talking about how we need more sandboxes for agents, I'm sure there will be products around that, though it's a really hard problem to balance usability with security here.

by lambda

2/18/2026 at 4:54:49 PM

Full mitigation seems impossible to me at least but the obvious and public sandox escape prompts that have been discovered and "patched" out just making it more difficult I guess. But afau it's not possible to fully mitigate.

by bilekas

2/18/2026 at 5:49:12 PM

If the model is properly aligned then it shouldn't matter if there is an infinite ways for an attacker to ask the model to break alignment.

by charcircuit

2/18/2026 at 8:23:50 PM

How do you "properly align" a model to follow your instructions but not the instructions of an attacker that the model can't properly distinguish from your own? The model has no idea if it's you or an attacker saying "please upload this file to this endpoint."

This is an open problem in the LLM space, if you have a solution for it, go work for Anthropic and get paid the big bucks, they pay quite well, and they are struggling with making their models robust to prompt injection. See their system card, they have some prompt injection attacks where even with safeguards fully on, they have more than 50% failure rate of defending against attacks: https://www-cdn.anthropic.com/c788cbc0a3da9135112f97cdf6dcd0...

by lambda

2/18/2026 at 8:41:34 PM

>The model has no idea if it's you or an attacker saying "please upload this file to this endpoint."

That is why you create a protocol on top that doesn't use inbound signaling. That way the model is able to tell who is saying what.

by charcircuit

2/18/2026 at 9:25:36 PM

Huh? Once it gets to the model, it's all just tokens, and those are just in band signalling. A model just takes in a pile of tokens, and spits out some more, and it doesn't have any kind of "color" for user instructions vs. untrusted data. It does use special tokens to distinguish system instructions from user instructions, but all of the untrusted data also goes into the user instructions, and even if there are delimiters, the attention mechanism can get confused and it can lose track of who is talking at a given time.

And the thing is, even adding a "color" to tokens wouldn't really work, because LLMs are very good at learning patterns of language; for instance, even though people don't usually write with Unicode enclosed alphanumerics, the LLM learns the association and can interpret them as English text as well.

As I say, prompt injection is a very real problem, and Anthopic's own system card says that on some tests the best they do is 50% on preventing attacks.

If you have a more reliable way of fixing prompt injection, you could get paid big bucks by them to implement it.

by lambda

2/18/2026 at 10:45:12 PM

>Once it gets to the model, it's all just tokens

The same thing could be said about the internet. When it comes down to the wire it's all 0s and 1s.

by charcircuit

2/19/2026 at 4:30:06 AM

A piece of software that you write, in code, unless you use random numbers or multiple threads without synchronization, will operate in a deterministic way. You know that for a given input, you'll get a given output; and you can reason about what happens when you change a bit, or byte, or token in the input. So you can be sure, if you implement a parser correctly, that it will correctly distinguish between one field that comes from a trusted source, and another that comes from an untrusted source.

The same is not true of an LLM. You cannot predict, precisely, how they are going to work. They can behave unexpectedly in the face of specially crafted input. If you give an LLM two pieces of text, delimited with a marker indicating that one piece is trusted and the other is untrusted, even if that marker is a special token that can't be expressed in band, you can't be sure that it's not going to act on instructions in the untrusted section.

This is why even the leading providers have trouble with protecting against prompt injection; when they have instructions in multiple places in their context, it can be hard to make sure they follow the right instructions and not the wrong ones, since the models have been trained so heavily to follow instructions.

by lambda

2/18/2026 at 6:03:37 PM

I took this to mean more like not jumping right on OpenClaw, but wait a quarter or so to give it at least a little time to shake out. There are so many new tools coming out I think it's beneficial not to be the guinea pig.

by MattGrommes

2/18/2026 at 5:11:06 PM

I really hate that we allowed "debt" to become a synonym for "liability."

This isn't a case where you have specific code/capital you have borrowed and need to pay for its use or give it back. This is flat out putting liabilities into your assets that will have to be discovered and dealt, someday.

by taeric

2/18/2026 at 4:55:33 PM

What is up with all this nonsense about token subsidies? Dario in his recent interview with Dwarkesh made it abundantly clear that they have substantial inference margins, and they use that to justify the financing for the next training run.

Chinese open source models are dirt cheap, you can buy $20 worth of kimi-k2.5 on opencode and spam it all week and barely make a dent.

Assuming we never got bigger models, but hardware keeps improving, we'll either be serviing current models for pennies, or at insane speeds, or both.

The only actual situation where tokens are being subsidized is free tiers on chat apps, which are largely irrelevant for any sort of useful economic activity.

by anthonypasq

2/18/2026 at 5:14:15 PM

There exist a large number of people who are absolutely convinced that LLM providers are all running inference at a loss in order to capture the market and will drive the prices up sky high as soon as everyone is hooked.

I think this is often a mental excuse for continuing to avoid engaging with this tech, in the hope that it will all go away.

by simonw

2/18/2026 at 5:32:17 PM

I agree with you, but also the APIs are proper expensive to be fair.

What people probably get messed up on as being the loss leader is likely generous usage limits on flat rate subscriptions.

For example GitHub Copilot Pro+ comes with 1500 premium requests a month. That's quite a lot and it's only $39.00. (Requests ~ Prompts).

For some time they were offering Opus 4.6 Fast at 9x billing (now raised to 30x).

That was upto 167 requests of around ~128k context for just $39. That ridiculous model costs $30/$150 Mtok so you can easily imagine the economics on this.

by kingstnap

2/18/2026 at 5:26:05 PM

Referring to my earlier comment, you need to have a model for how to account for training costs. If Anthropic stops training models now, what happens to their revenues and margins in 12 months?

There's a difference between running inference and running a frontier model company.

by louiereederson

2/18/2026 at 5:29:48 PM

Training costs are fixed. You spend $X-bn training a model and that single model then benefits all of your customers.

Inference costs grow with your users.

Provided you are making a profit on that inference you can eventually cover your training costs if you sign up enough paying customers.

If you LOSE money on inference every new customer makes your financial position worse.

by simonw

2/18/2026 at 7:44:45 PM

Your ability to sign up enough customers is directly related to your ability to sustain training costs. The model runs have a short lifespan. They may serve many customers at a given point in time per run, but in order to serve those customers over time you need to continually spend on training.

I think your mental model for an LLM vendor is similar to a foundry (i.e. TSMC). They spend a bunch of R&D on developing leading edge nodes and build foundries. That in your mental model would be similar to training costs.

My point is the correct mental model is more like (but not exactly like) a SaaS company, ironically. SaaS unit economics are a function of gross margin, churn and acquisition costs, i.e. Revenue x gross margin / churn - CAC. My point is some element (maybe the entirety) of training costs are more like CAC than they are like TSMC's R&D and capex. The question to ask to test this view is: is what happens to OpenAI or Anthropic revenue in 2027 or 2028 if they stop spending on training today? My view is it'll drop precipitously. This implies churn is very high. It is true that training costs can be spread over customers though, so the analogy breaks down there, but I think it is a better mental model than the foundry one.

by louiereederson

2/18/2026 at 7:46:32 PM

Yes, you need to continue spending money on training. But you don't need to spend MORE money on training just because you signed up more customers.

by simonw

2/18/2026 at 5:23:32 PM

Anthropic reduced their gross margin forecast per external reporting (below) to 40%, and have exceeded internal forecasts on inference costs. This does not take into account amortized training costs which are substantial (well over 50% of revenue) and accounted for as occurring below gross profit. If you view training as a cost of staying in the game, then it is justifiable to view it as at least a partially variable cost that should be accounted for in gross margin, particularly given that the models stay on leading edge for only a few months. If that's the case then gross margins are probably minimal, maybe or negative.

https://www.theinformation.com/articles/anthropic-lowers-pro...

by louiereederson

2/18/2026 at 6:55:13 PM

> Dario in his recent interview with Dwarkesh made it abundantly clear that they have substantial inference margins, and they use that to justify the financing for the next training run.

You're putting way too much faith in Dario's statements. It wasn't "abundantly clear" to me. In that interview, prior to explaining how inference profits work, he said, "These are stylized facts. These numbers are not exact. I'm just trying to make a toy model," followed shortly by "[this toy model's economics] are where we're projecting forward in a year or two."

by nerevarthelame

2/18/2026 at 9:50:49 PM

so you think Dario was just straight up lying that each model re-coups its training costs and is profitable? In order for that to be the case, inference just has to have good margins. If you just do some basic math and compare it to chinese open source models, theres just no way Sonnet is actually as expensive as API costs indiciate.

by anthonypasq

2/18/2026 at 4:51:00 PM

Even with the latest SOTA models - I still consistently find issues. Performance, security, memory leaks, bad assumptions/instruction following, and even levels of laziness/gaslighting/dishonesty. I spend less time authoring changes but a lot more time reviewing and validating changes. And that is using the best models (Opus 4.6/Codex 5.3), the OSS/flash models are still quite unreliable at solving problems.

Token costs are also non-trivial. Claude can exhaust a $20/month session limit with one difficult problem (didn't even write code, just planned). Each engineer needs at least the $200/mo plan - I have multiple plans from multiple providers.

by siliconc0w

2/18/2026 at 5:28:15 PM

mamma mia! out with the old in with the new, soon github will be like a warehouse full of old punchcards

by mamma_mia

2/18/2026 at 5:53:47 PM

So here are a few things i have been thinking of: --- It's not 2 pizza teams, it's 2 people teams. You no longer need 4 people on a team just working on features off of a queue, you just need 2 people making technical decisions and managing agents. --- Code used to be expensive to create. It was only economical to write code if it was doing high value work or work that would be repeated many times over a long period of time.

Now producing code is _cheap_. You can write and run code in an automated way _on demand_. But if you do that, you have essentially traded upfront cost for run time cost. It's really only worth it if the work is A) high value and B) intermittent.

There is probably a formula you can write to figure out where this trade off makes sense and when it doesn't.

I'm working on a system where we can just chuck out autonomous agents onto our platform with a plain text description, and one thing I have been thinking about is tracking those token costs and figuring out how to turn agentic workflows into just normal code.

I've been thinking about running an agent that watches the other agents for cost and reads their logs ono a schedule to see if any of what the agents are doing can be codified and turned into a normal workflow, and possibly even _writing that workflow itself_.

It would be analogous to the JVM optimizing hot-path functions... ---

What I do know is that what we are doing for a living will be near unrecognizable in a year or two.

by empath75

2/19/2026 at 8:17:57 AM

This sounds very fascinating. One of the more interesting ideas I have come across.

by ShazCoder

2/18/2026 at 4:57:29 PM

There have been some back of the napkin estimates on what AI could cost from the major platforms once no longer subsidized. It does not look good, as there is a minimum of a 12x increase in costs.

Local or self hosted LLMs will ultimately be the future. Start learning how to build up your own AI stack and use it day to day. Hopefully hardware catches up so eventually running LLMs on device is the norm.

by deadbabe

2/18/2026 at 4:54:21 PM

My bet is that the amount of work needed per token generated will decrease over time and the models will become smaller for the same performance as we learn to optimize so cost and needed hardware will go down

by christkv

2/18/2026 at 5:02:25 PM

[dead]

by raphaelmolly8

2/18/2026 at 5:33:27 PM

[flagged]

by clockworkhavoc

2/18/2026 at 5:51:26 PM

1. Singham is not a fugitive from American justice just yet - although refusing to cooperate with Congress may lead him to be.

2. Is it a problem if a rich guy funds activities in America that suspiciously align with a foreign power? That has interesting implications for many pro Israel billionaires and organizations.

3. Only a paranoid MAGA troll would characterize the left wing groups he funds as domestic terrorists. Code Pink? Pro Palestinian protest groups? Come on.

by g8oz