3/19/2026 at 7:53:58 AM
> There is no world where you input a document lacking clarity and detail and get a coding agent to reliably fill in that missing clarity and detailThat is not true, and the proof is that LLMs _can_ reliably generate (relatively small amounts of) working code from relatively terse descriptions. Code is the detail being filled in. Furthermore, LLMs are the ultimate detail fillers, because they are language interpolation/extrapolation machines. And their popularity is precisely because they are usually very good at filling in details: LLMs use their vast knowledge to guess what detail to generate, so the result usually makes sense.
This doesn't detract much from the main point of the article though. Sometimes the interpolated detail is wrong (and indeterministic), so, if reliable result is to be achieved, important details have to be constrained, and for that they have to be specified. And whereas we have decades of tools and culture for coding, we largely don't have that for extremely detailed specs (except maybe at NASA or similar places). We could figure it out in the future, but we haven't yet.
by bad_username
3/19/2026 at 8:03:10 AM
> That is not true, and the proof is that LLMs _can_ reliably generate (relatively small amounts of) working code from relatively terse descriptions.LLMs can generate (relatively small amounts of) working code from relatively terse descriptions, but I don’t think they can do so _reliably_.
They’re more reliable the shorter the code fragment and the more common the code, but they do break down for complex descriptions. For example, try tweaking the description of a widely-known algorithm just a little bit and see how good the generated code follows the spec.
> Sometimes the interpolated detail is wrong (and indeterministic), so, if reliable result is to be achieved
Seems you agree they _cannot_ reliably generate (relatively small amounts of) working code from relatively terse descriptions
by Someone
3/19/2026 at 1:40:03 PM
> try tweaking the description of a widely-known algorithm just a little bit and see how good the generated code follows the spec.this works well for me
by mathgradthrow
3/19/2026 at 9:11:15 AM
Neither can humans, but the industry has decades of experience with how to instruct and guide human developer teams using specs.by mike_hearn
3/19/2026 at 9:40:35 AM
Usually, you don't want your developers to be coding monkeys, for good results. You need the human developer in the loop to even define the spec, maybe contributing ideas, but at the very least asking questions about "what happens when..." and "have you thought about...".In fact, this is a huge chunk of the value a developer brings to the table.
by dxdm
3/19/2026 at 2:07:38 PM
And this is usually one of the defining traits of a senior engineer. They understand the tech and its limitations, and thus are able to look around corners, ask good questions, and, overall, provide quality product input.by gusmd
3/19/2026 at 3:48:03 PM
In other words, prudential judgement.Programs are a socially constructed artifact that help communicate and express a model (which is perpetually locked in people's heads with variance across engineers; divergence is addressed as the program develops). Determining what should or should not be done is a matter of not just domain knowledge, but practical reason, which is to say prudence, which is a virtue that can only be acquired by experience. It is an ability to apply universal principles to particular situations.
This is why young devs, even when clever in some local sense, are worse at understanding the right moves to make in context. Code does not stand alone. It exists entirely in the service of something and is bound by constraints that are external to it.
by danielam
3/19/2026 at 3:23:36 PM
This is very much my experience from working with outsourced development. Almost by design, they tend to lack domain expertise or an intimate understanding of the cultures and engineering values of the company they're contracted out to.This means that they will very quickly help you discover all the little details that seemed so obvious to you that you didn't even think to mention them, but were nonetheless critical to a successful implementation. The corollary to that is, the potential ROI of outsourcing is inversely proportional to how many of these little details your project has, and how important they are.
So far I've found LLM coding to be kind of the same. For projects where those details are relatively unimportant, they can save me a bunch of effort. But I would not want to let an LLM build and maintain something like an API or database schema. Doing a good job of those requires too much knowledge of expected usage patterns working through design tradeoffs. And they tend to be incredibly expensive to change after deployment so it pays to take your time and get your hands dirty.
I also kind of hate them for writing tests, for similar reasons. I know many people love them for it because writing tests isn't super happy fun times, but for my part I'm tired of dealing with LLM-generated test suites being so brittle that they actively hinder future development.
by bunderbunder
3/19/2026 at 12:14:09 PM
When LLMs generate an appropriate program from ambiguous requirements, they do this because the requirements happen to match something similar that has been done previously elsewhere.There is a huge amount of programming work that consists in reinventing the wheel, i.e. in redoing something very similar to programs that have been written thousands of times before.
For this kind of work LLMs can greatly improve productivity, even if they are not much better than if you would be allowed to search, copy and paste from the programs on which the LLM has been trained. The advantage of an LLM is the automation of the search/copy/paste actions, and even more than this, the removal of the copyrights from the original programs. The copyright laws are what has resulted in huge amounts of superfluous programming work, which is necessary even when there are open-source solutions, but the employer of the programmer wants to "own the IP".
On the other hand, for really novel applications, or for old applications where you want to obtain better performance than anyone has gotten before, providing an ambiguous prompt to an LLM will get you nowhere.
by adrian_b
3/20/2026 at 4:34:57 AM
> and even more than this, the removal of the copyrights from the original programsThis seems really strange to me. Can you explain how this is different than just stealing code from other sources, or copying it wholly from open source repos?
by bluefirebrand
3/19/2026 at 10:37:47 AM
Humans have the ability to retrospect, push back on a faulty spec, push back on an unclarified spec, do experiments, make judgement calls and build tools and processes to account for their own foibles.by MoreQARespect
3/19/2026 at 11:45:17 AM
Humans also have the ability to introspect. Ultimately, (nearly) every software project is intended to provide a service to humans, and most humans are similar in most ways: "what would I want it to do?" is a surprisingly-reliable heuristic for dealing with ambiguity, especially if you know where you should and shouldn't expect it to be valid.The best LLMs can manage is "what's statistically-plausible behaviour for descriptions of humans in the corpus", which is not the same thing at all. Sometimes, I imagine, that might be more useful; but for programming (where, assuming you're not reinventing wheels or scrimping on your research, you're often encountering situations that nobody has encountered before), an alien mind's extrapolation of statistically-plausible human behaviour observations is not useful. (I'm using "alien mind" metaphorically, since LLMs do not appear particularly mind-like to me.)
by wizzwizz4
3/19/2026 at 12:04:02 PM
Most companies I've worked for have had 'know the customer' events so that developers learn what the customers really do and in turn even if we are not in their domain we have a good idea what they care about.by bluGill
3/19/2026 at 11:32:15 AM
which bits of this do you think llm based agents can't do?by pablobaz
3/19/2026 at 11:44:58 AM
Not get stuck on an incorrect train of thought, not ignore core instructions in favour of training data like breaking naming conventions across sessions or long contexts, not confidently state "I completely understand the problem and this will definitely work this time" for the 5th time without actually checking. I could go on.by interstice
3/19/2026 at 12:46:01 PM
LLMs by their nature are not goal orientated (this is fundamental difference of reinforcement learning vs neural networks for example). So a human will have the, let's say, the ultimate goal of creating value with a web application they create ("save me time!"). The LLM has no concept of that. It's trying to complete a spec as best it can with no knowledge of the goal. Even if you tell it the goal it has no concept of the process to achieve or confirm the goal was attained - you have to tell it that.by mbesto
3/19/2026 at 11:38:29 AM
The main thing they cannot do is be held accountable for any decisions, which makes them not trustworthy.by ModernMech
3/19/2026 at 11:42:20 AM
This is not correct. They can say "sorry" which makes them as accountable as ordinary developer.by vbezhenar
3/19/2026 at 11:48:02 AM
I've found recent versions of Claude and codex to be reluctant in this regard. They will recognise the problem they created a few minutes ago but often behave as if someone else did it. In many ways that's true though, I suppose.by interstice
3/19/2026 at 1:21:57 PM
Does it do this for really cut and dry problems? I’ve noticed that ChatGPT will put a lot of effort into (retroactively) “discovering” a basically-valid alternative interpretation of something it said previously, if you object on good grounds. Like it’s trying to evade admitting that it made a mistake, but also find some say to satisfy your objection. Fair enough, if slightly annoying.But I have also caught it on straightforward matters of fact and it’ll apologize. Sometimes in an over the top fashion…
by bee_rider
3/19/2026 at 4:07:45 PM
Ordinary developers get fired for poor performance *all the time*.by bigfishrunning
3/19/2026 at 12:15:40 PM
That's not what accountability isby bluefirebrand
3/19/2026 at 3:41:43 PM
Accountability: "Something that SWE's run screaming from".Example: "We should have professional accountability in software"
SWE: "This would bring about the end of the world!!!1!"
by pixl97
3/19/2026 at 4:09:56 PM
The economics of software development have lowered the bar for software engineers: there simply aren't enough people who are good at it (or even want to be), and the salaries are very high, so plenty of people who shouldn't be SWE's are.I am a software engineer, and I would absolutely love to see more professional accountability in this field. Unfortunately, it would make the cost of software go up significantly (because many many people writing software would be ejected from the industry)
by bigfishrunning
3/19/2026 at 3:17:18 PM
LLM based solutions don’t need to stay dry and warm at night, with a full belly, possibly with their sexual partner with whom they have a drive to procreate.by datsci_est_2015
3/19/2026 at 2:08:04 PM
any of them.by MoreQARespect
3/19/2026 at 6:41:21 PM
You can guide humans, but ultimately the reason senior software developers have been payed large sums of money is that even with specs mostly we have found it works better to have someone with good judgement actually doing the work, otherwise we would have just been using specifications. The question remains open if llm’s can show good judgement, often my experience with claude is that it doesn’t if the problem domain is non-trivial but it’s possible that won’t always be true.by FuckButtons
3/19/2026 at 11:34:40 AM
Specs are insufficient to guide human developer teams, so I don’t understand the comparison.by ModernMech
3/19/2026 at 2:48:40 PM
anything can be reliable if you have good testsby jes5199
3/19/2026 at 2:11:57 PM
We do have such detailed specifications. But they are written in a language with a narrow interface. It’s a technique called, “program synthesis,” and you can find an example of such a language called, Synquid.It might be illuminating to see what a mathematically precise specification can and cannot do when it comes to generating programs. A major challenge in formal methods is proving that the program implements the specification faithfully, known as the specification gap. If you have a very high level and flexible specification language, such as TLA+, there is a lot of work to do to verify that the program you write meets the specification you wrote. For something like Synquid that is closer to the code there are more constraints on expressivity.
The point is that spoken language is not sufficiently precise to define a program.
Just because an LLM can fill in plausible details where sufficient detail is lacking doesn’t indicate that it’s solving the specification gap. If the program happens to implement the specification faithfully you got lucky. You still don’t actually know that’s true until you verify it.
It’s different with a narrow interface though: you can be very precise and very abstract with a good mathematical system for expressing specifications. It’s a lot more work and requires more training to do than filling in a markdown file and trying to coax the algorithm into outputting what you want through prose and fiction.
by agentultra
3/19/2026 at 5:13:54 PM
This works well for problems that are purely algorithmic in nature. But problems often have solutions that don't fall into those categories, especially in UI/UX. When people tell me that LLMs can solve anything with a sufficiently details spec, I ask them to produce such a spec for Adobe Photoshop.by Shebanator
3/19/2026 at 3:37:46 PM
I think the worst case is actually that the LLM faithfully implements your spec, but your spec was flawed. To the extent that you outsource the mechanical details to a machine trained to do exactly what you tell it, you destroy or at least hamper the feedback loop between fuzzy human thoughts and cold hard facts.by et1337
3/19/2026 at 3:42:09 PM
Unfortunately even formal specifications have this problem. Nothing can replace thinking. But sycophancy, I agree, is a problem. These tools are designed to be pleasing, to generate plausible output; but they cannot think critically about the tasks they're given.Nothing will save you from a bad specification. And there's no royal road to knowing how to write good ones.
by agentultra
3/19/2026 at 3:53:51 PM
Right, there’s no silver bullet. I think all I can do is increase the feedback bandwidth between my brain and the real world. Regular old stuff like linters, static typing, borrow checkers, e2e tests… all the way to “talking to customers more”by et1337
3/19/2026 at 2:29:54 PM
> Sometimes the interpolated detail is wrongyou just (correctly) negated your own claim
by abdulhaq
3/19/2026 at 4:33:39 PM
ChatGPT 5.4 pro has surprised me several times; when asking a "can such and such be done" type question, intending to have a discussion about whether a thing can be done in principle, and what it might look like, it's actually produced a working example in response, in addition to answering the questions.Some of the missing pieces come from memory, knowing which topics I like to explore, some from the model itself, either baked in knowledge or what it picks up searching, but they can definitely take a vague, handwavy half baked idea and whip up a full app or game or whitepaper. Sometimes it's "exactly what I wanted!", other times it's "exactly the kind of thing I was talking about!"
Semantics and context and nuance are part and parcel of LLM capabilities. Superhuman in some areas, definitely subhuman in others.
AI is getting pretty competent and clever.
by observationist
3/19/2026 at 8:22:37 AM
> LLMs _can_ reliably generate (relatively small amounts of) working code from relatively terse descriptions. Code is the detail being filled in.They can generate boilerplate, sure. Or they can expand out a known/named algorithm implementation, like pulling in a library. But neither of those is generating detail that wasn't there in the original (at most it pulls in the detail from somewhere in the training set).
by lmm
3/19/2026 at 8:53:17 AM
They do more than that. If you ask for ui with a button that button won't be upside down even if you didn't specify its orientation. Lots of the detail can be inferred from general human preferences, which are present in the LLMs' training data. This extends way beyond CS stuff like details of algorithm implementations.by tibbe
3/19/2026 at 10:07:43 AM
Isn't "not being upsidedown" just one of the default properties of a button in whatever GUI toolkit you are using? I'd be worried if an LLM _did_ start setting all the possible button properties.by zabzonk
3/19/2026 at 10:49:24 AM
Putting LLMs on a pedestal is very much in vogue these days.by MoreQARespect
3/19/2026 at 12:06:09 PM
If you ask for increase and decrease buttons they will put the right icons on it (not words) and lay them out right.by bluGill
3/19/2026 at 11:50:05 PM
I cannot confirm that to work reliably and properly from experience of asking various LLMs about reducing button size in tkinter to a minimum for a button's label.by zelphirkalt
3/19/2026 at 9:14:39 AM
That’s exactly what they said. Details “elsewhere in its training set”.by skywhopper
3/19/2026 at 9:13:16 AM
“LLMs _can_ reliably generate (relatively small amounts of) working code from relatively terse descriptions”Only with well-known patterns that represent shared knowledge specified elsewhere. If the details they “fill in” each time differ in ways that change behavior, then the spec is deficient.
If we “figure out” how to write such detailed specs in the future, as you suggest, then that becomes the “code”.
by skywhopper
3/19/2026 at 11:40:53 AM
Right, when you tell it “draw me a renaissance woman” and it gives you a facsimile of the Mona Lisa, it’s not because it intelligently anticipated what you wanted — it’s just been trained thoroughly to make that association.by ModernMech
3/19/2026 at 11:34:46 AM
Also, they're a bit more willing to make assumptions.After awhile, I think we all get a sense of not only the amount of micro-decisions you have to make will building stuff (even when you're intimate with the domain), but also the amount of assumptions you'll need to make about things you either don't know yet or haven't fully fleshed out.
I'm painfully aware of the assumptions I'm making nowadays and that definitely changes the way I build things. And while I love these tools, their ability to not only make assumptions, but over-engineer those assumptions can have disastrous effects.
I had Claude build me a zip code heat map given a data source and it did it spectacularly. Same with a route planner. But asking it build out medical procedure documentation configurations based off of a general plan DID NOT work as well as I had expected it would.
Also, I asked Claude about what the cron expression I wrote would do, and it got it wrong (which is expected because Azure Web Jobs uses a non-standard form). But even after telling it that it was wrong, and giving it the documentation to rely on, it still doubled down on the wrong answer.
by mexicocitinluez
3/19/2026 at 1:58:35 PM
It means for most specs, you can just use an average solution, or a most-popular solution.I'm absolutely on board with that. We probably need less weird and outlier decisions in designs for something that is a boring ass business website.
by hnthrow0287345
3/19/2026 at 10:32:13 PM
> the proof is that LLMs _can_ reliably generate (relatively small amounts of) working code from relatively terse descriptions> Sometimes the interpolated detail is wrong (and indeterministic)
... You consider incorrect, non deterministic results to be "reliable"?
by strix_varius
3/20/2026 at 12:07:40 PM
Do you consider the implementation of such specs by another human to (always) be correct and deterministic?Heck, if I reimplement something I worked on a month ago it’s probably not going to be the exact same. Being non deterministic needn’t to be a problem, as long as it falls within certain boundaries and produces working results.
by tinodb
3/19/2026 at 5:07:50 PM
Isn’t this no different than a compiler? I don’t specify which registers to use in C or Java, and yet the code runs!An awful lot of time has been put into the compiler to know which registers to use and how to juggle them. Is an LLM any different in its behavior (albeit different in how it was trained)? If not, then specs are just an even higher level programming language.
I think the difference is that, with a C compiler, when it gets it wrong I’ll have some terrible performance impact, it when the LLM gets it wrong, it will do something nobody wanted, like delete someone’s account or debit one account without crediting another.
by lowbloodsugar
3/19/2026 at 4:09:29 PM
Exactly. Any developer working on any project will encounter a decision that wasn’t in the spec, where they use their judgement and taste to fill in gaps. The idea that only code can be a complete spec assumes the code perfectly matches the original intent - which we know it rarely does in a project of meaningful size.by cush
3/19/2026 at 9:36:16 AM
In get the sense that what you are responding to and even many comments to yours are expressing a kind of coping with the current dynamic, only exacerbated by the rather elitist and egoistic mentality that people in tech have had for a very long time now; i.e., they are falling…being pushed from Mt Olympus and there is A LOT of anxious rationalization going on.Not a mere 5 years ago even tech people were chortling down their upturned noses that people were complaining that their jobs were being “taken”, and now that the turns have tabled, there is a bunch of denial, anger, and grief going on, maybe even some depression as many of the recently unemployed realize the current state of things.
It’s all easy to deride the inferiority of AI when you’re employed in a job doing things as you had been all your career, thinking you cannot be replaced… until you find yourself on the other side of the turn that has tabled.
by roysting
3/19/2026 at 9:46:46 AM
I use AI for my work every single day - and during some weekends too. Claude Code, with Opus. It is far from being able to reliably produce the code that we need for production. It produces code that looks ok most of the time, but I have seen it lose track of key details, misinterpret requisites and even ignore them sometimes - "on purpose", as in it writing something like "let's not do that requirement, it's not necessary".This kind of thing happens at least once per day to me, maybe more.
I am not denying that it is useful, let me be clear. It is extremely convenient, especially for mechanical tasks. It has other advantages like quick exploration of other people's code, for example. If my employer didn't provide a corporate account for me, I would pay one from my own pocket.
That said, I agree with OP and the author that it is not reliable when producing code from specs. It does things right, I would say often. That might be good enough for some fields/people. It's good enough for me, too. I however review every line it produces, because I've seen it miss, often, as well.
by otikik
3/20/2026 at 2:21:08 AM
I think we are in a bit of a trough of people trying to use methods and processes of irrelevant practices, when what is needed for a whole new dynamic, is an adapted and novel set of methods and processes. I suspect we may not get out of it for a number of years until a distinct AI-native generation can start emerging. I have had great effect and know others who have done even and far better than me, and all of them have totally reworked and revised everything about software development processes. Being able to adapt things form first principles seems to be the differentiating factor. I don't like it, but we are probably going to see a whole generation of the past software devs unable or unwilling to adapt to the revolution in the industry that is simply not going to go away.Unfortunately we will lose things precisely because all that experience and expertise will not be captured and implemented, just like we have lost so man things from the past, like the many different proprietary and secret methods and practices that were jealously guarded by artisans, craftsmen, and artists. But now I've gotten off track a bit. Cheers.
by roysting
3/19/2026 at 3:28:57 PM
[dead]by TheJord
3/19/2026 at 9:41:39 AM
I can't help but imagine that this is how some people felt about doctors once webmd came out.It's some nice rhetoric, but you're not actually saying much.
by rdevilla
3/19/2026 at 12:06:43 PM
As you can see by downvotes and comments, they still don't get it.LLMs make developers more efficient. That much is obvious to anyone who isn't blinded by fear.
But people will respond "but you still need developers!" True. You don't need nearly as many, though. In fact, with an LLM in their hands, the poor performers are more of a liability than ever. They'll be let go first.
But even the "smart" developers will be subsumed, as vastly more efficient companies outcompete the ones where they work.
Companies with slop-tolerant architectures will take over every industry. They'll have humans working there. But not many.
by jappgar
3/19/2026 at 4:13:58 PM
> LLMs make developers more efficient.They do not. I review a ton of code, and while the quantity is going up, the quality of that code is getting worse. LLMs only make developers more efficient if they skip the due-diligence required to verify its output; they all say they don't, and almost all of them do.
by bigfishrunning
3/20/2026 at 2:20:27 PM
Probably it's not a perston you're answering to, so there no point to try to have a reasonable conversation.by itsfine2