A sufficiently detailed spec is code

3/19/2026 at 7:53:58 AM

> There is no world where you input a document lacking clarity and detail and get a coding agent to reliably fill in that missing clarity and detail

That is not true, and the proof is that LLMs _can_ reliably generate (relatively small amounts of) working code from relatively terse descriptions. Code is the detail being filled in. Furthermore, LLMs are the ultimate detail fillers, because they are language interpolation/extrapolation machines. And their popularity is precisely because they are usually very good at filling in details: LLMs use their vast knowledge to guess what detail to generate, so the result usually makes sense.

This doesn't detract much from the main point of the article though. Sometimes the interpolated detail is wrong (and indeterministic), so, if reliable result is to be achieved, important details have to be constrained, and for that they have to be specified. And whereas we have decades of tools and culture for coding, we largely don't have that for extremely detailed specs (except maybe at NASA or similar places). We could figure it out in the future, but we haven't yet.

by bad_username

3/19/2026 at 8:03:10 AM

> That is not true, and the proof is that LLMs _can_ reliably generate (relatively small amounts of) working code from relatively terse descriptions.

LLMs can generate (relatively small amounts of) working code from relatively terse descriptions, but I don’t think they can do so _reliably_.

They’re more reliable the shorter the code fragment and the more common the code, but they do break down for complex descriptions. For example, try tweaking the description of a widely-known algorithm just a little bit and see how good the generated code follows the spec.

> Sometimes the interpolated detail is wrong (and indeterministic), so, if reliable result is to be achieved

Seems you agree they _cannot_ reliably generate (relatively small amounts of) working code from relatively terse descriptions

by Someone

3/19/2026 at 1:40:03 PM

> try tweaking the description of a widely-known algorithm just a little bit and see how good the generated code follows the spec.

this works well for me

by mathgradthrow

3/19/2026 at 9:11:15 AM

Neither can humans, but the industry has decades of experience with how to instruct and guide human developer teams using specs.

by mike_hearn

3/19/2026 at 9:40:35 AM

Usually, you don't want your developers to be coding monkeys, for good results. You need the human developer in the loop to even define the spec, maybe contributing ideas, but at the very least asking questions about "what happens when..." and "have you thought about...".

In fact, this is a huge chunk of the value a developer brings to the table.

by dxdm

3/19/2026 at 2:07:38 PM

And this is usually one of the defining traits of a senior engineer. They understand the tech and its limitations, and thus are able to look around corners, ask good questions, and, overall, provide quality product input.

by gusmd

3/19/2026 at 3:48:03 PM

In other words, prudential judgement.

Programs are a socially constructed artifact that help communicate and express a model (which is perpetually locked in people's heads with variance across engineers; divergence is addressed as the program develops). Determining what should or should not be done is a matter of not just domain knowledge, but practical reason, which is to say prudence, which is a virtue that can only be acquired by experience. It is an ability to apply universal principles to particular situations.

This is why young devs, even when clever in some local sense, are worse at understanding the right moves to make in context. Code does not stand alone. It exists entirely in the service of something and is bound by constraints that are external to it.

by danielam

3/19/2026 at 3:23:36 PM

This is very much my experience from working with outsourced development. Almost by design, they tend to lack domain expertise or an intimate understanding of the cultures and engineering values of the company they're contracted out to.

This means that they will very quickly help you discover all the little details that seemed so obvious to you that you didn't even think to mention them, but were nonetheless critical to a successful implementation. The corollary to that is, the potential ROI of outsourcing is inversely proportional to how many of these little details your project has, and how important they are.

So far I've found LLM coding to be kind of the same. For projects where those details are relatively unimportant, they can save me a bunch of effort. But I would not want to let an LLM build and maintain something like an API or database schema. Doing a good job of those requires too much knowledge of expected usage patterns working through design tradeoffs. And they tend to be incredibly expensive to change after deployment so it pays to take your time and get your hands dirty.

I also kind of hate them for writing tests, for similar reasons. I know many people love them for it because writing tests isn't super happy fun times, but for my part I'm tired of dealing with LLM-generated test suites being so brittle that they actively hinder future development.

by bunderbunder

3/19/2026 at 12:14:09 PM

When LLMs generate an appropriate program from ambiguous requirements, they do this because the requirements happen to match something similar that has been done previously elsewhere.

There is a huge amount of programming work that consists in reinventing the wheel, i.e. in redoing something very similar to programs that have been written thousands of times before.

For this kind of work LLMs can greatly improve productivity, even if they are not much better than if you would be allowed to search, copy and paste from the programs on which the LLM has been trained. The advantage of an LLM is the automation of the search/copy/paste actions, and even more than this, the removal of the copyrights from the original programs. The copyright laws are what has resulted in huge amounts of superfluous programming work, which is necessary even when there are open-source solutions, but the employer of the programmer wants to "own the IP".

On the other hand, for really novel applications, or for old applications where you want to obtain better performance than anyone has gotten before, providing an ambiguous prompt to an LLM will get you nowhere.

by adrian_b

3/20/2026 at 4:34:57 AM

> and even more than this, the removal of the copyrights from the original programs

This seems really strange to me. Can you explain how this is different than just stealing code from other sources, or copying it wholly from open source repos?

by bluefirebrand

3/19/2026 at 10:37:47 AM

Humans have the ability to retrospect, push back on a faulty spec, push back on an unclarified spec, do experiments, make judgement calls and build tools and processes to account for their own foibles.

by MoreQARespect

3/19/2026 at 11:45:17 AM

Humans also have the ability to introspect. Ultimately, (nearly) every software project is intended to provide a service to humans, and most humans are similar in most ways: "what would I want it to do?" is a surprisingly-reliable heuristic for dealing with ambiguity, especially if you know where you should and shouldn't expect it to be valid.

The best LLMs can manage is "what's statistically-plausible behaviour for descriptions of humans in the corpus", which is not the same thing at all. Sometimes, I imagine, that might be more useful; but for programming (where, assuming you're not reinventing wheels or scrimping on your research, you're often encountering situations that nobody has encountered before), an alien mind's extrapolation of statistically-plausible human behaviour observations is not useful. (I'm using "alien mind" metaphorically, since LLMs do not appear particularly mind-like to me.)

by wizzwizz4

3/19/2026 at 12:04:02 PM

Most companies I've worked for have had 'know the customer' events so that developers learn what the customers really do and in turn even if we are not in their domain we have a good idea what they care about.

by bluGill

3/19/2026 at 11:32:15 AM

which bits of this do you think llm based agents can't do?

by pablobaz

3/19/2026 at 11:44:58 AM

Not get stuck on an incorrect train of thought, not ignore core instructions in favour of training data like breaking naming conventions across sessions or long contexts, not confidently state "I completely understand the problem and this will definitely work this time" for the 5th time without actually checking. I could go on.

by interstice

3/19/2026 at 12:46:01 PM

LLMs by their nature are not goal orientated (this is fundamental difference of reinforcement learning vs neural networks for example). So a human will have the, let's say, the ultimate goal of creating value with a web application they create ("save me time!"). The LLM has no concept of that. It's trying to complete a spec as best it can with no knowledge of the goal. Even if you tell it the goal it has no concept of the process to achieve or confirm the goal was attained - you have to tell it that.

by mbesto

3/19/2026 at 11:38:29 AM

The main thing they cannot do is be held accountable for any decisions, which makes them not trustworthy.

by ModernMech

3/19/2026 at 11:42:20 AM

This is not correct. They can say "sorry" which makes them as accountable as ordinary developer.

by vbezhenar

3/19/2026 at 11:48:02 AM

I've found recent versions of Claude and codex to be reluctant in this regard. They will recognise the problem they created a few minutes ago but often behave as if someone else did it. In many ways that's true though, I suppose.

by interstice

3/19/2026 at 1:21:57 PM

Does it do this for really cut and dry problems? I’ve noticed that ChatGPT will put a lot of effort into (retroactively) “discovering” a basically-valid alternative interpretation of something it said previously, if you object on good grounds. Like it’s trying to evade admitting that it made a mistake, but also find some say to satisfy your objection. Fair enough, if slightly annoying.

But I have also caught it on straightforward matters of fact and it’ll apologize. Sometimes in an over the top fashion…

by bee_rider

3/19/2026 at 4:07:45 PM

Ordinary developers get fired for poor performance *all the time*.

by bigfishrunning

3/19/2026 at 12:15:40 PM

That's not what accountability is

by bluefirebrand

3/19/2026 at 3:41:43 PM

Accountability: "Something that SWE's run screaming from".

Example: "We should have professional accountability in software"

SWE: "This would bring about the end of the world!!!1!"

by pixl97

3/19/2026 at 4:09:56 PM

The economics of software development have lowered the bar for software engineers: there simply aren't enough people who are good at it (or even want to be), and the salaries are very high, so plenty of people who shouldn't be SWE's are.

I am a software engineer, and I would absolutely love to see more professional accountability in this field. Unfortunately, it would make the cost of software go up significantly (because many many people writing software would be ejected from the industry)

by bigfishrunning

3/19/2026 at 3:17:18 PM

LLM based solutions don’t need to stay dry and warm at night, with a full belly, possibly with their sexual partner with whom they have a drive to procreate.

by datsci_est_2015

3/19/2026 at 2:08:04 PM

any of them.

by MoreQARespect

3/19/2026 at 6:41:21 PM

You can guide humans, but ultimately the reason senior software developers have been payed large sums of money is that even with specs mostly we have found it works better to have someone with good judgement actually doing the work, otherwise we would have just been using specifications. The question remains open if llm’s can show good judgement, often my experience with claude is that it doesn’t if the problem domain is non-trivial but it’s possible that won’t always be true.

by FuckButtons

3/19/2026 at 11:34:40 AM

Specs are insufficient to guide human developer teams, so I don’t understand the comparison.

by ModernMech

3/19/2026 at 2:48:40 PM

anything can be reliable if you have good tests

by jes5199

3/19/2026 at 2:11:57 PM

We do have such detailed specifications. But they are written in a language with a narrow interface. It’s a technique called, “program synthesis,” and you can find an example of such a language called, Synquid.

It might be illuminating to see what a mathematically precise specification can and cannot do when it comes to generating programs. A major challenge in formal methods is proving that the program implements the specification faithfully, known as the specification gap. If you have a very high level and flexible specification language, such as TLA+, there is a lot of work to do to verify that the program you write meets the specification you wrote. For something like Synquid that is closer to the code there are more constraints on expressivity.

The point is that spoken language is not sufficiently precise to define a program.

Just because an LLM can fill in plausible details where sufficient detail is lacking doesn’t indicate that it’s solving the specification gap. If the program happens to implement the specification faithfully you got lucky. You still don’t actually know that’s true until you verify it.

It’s different with a narrow interface though: you can be very precise and very abstract with a good mathematical system for expressing specifications. It’s a lot more work and requires more training to do than filling in a markdown file and trying to coax the algorithm into outputting what you want through prose and fiction.

by agentultra

3/19/2026 at 5:13:54 PM

This works well for problems that are purely algorithmic in nature. But problems often have solutions that don't fall into those categories, especially in UI/UX. When people tell me that LLMs can solve anything with a sufficiently details spec, I ask them to produce such a spec for Adobe Photoshop.

by Shebanator

3/19/2026 at 3:37:46 PM

I think the worst case is actually that the LLM faithfully implements your spec, but your spec was flawed. To the extent that you outsource the mechanical details to a machine trained to do exactly what you tell it, you destroy or at least hamper the feedback loop between fuzzy human thoughts and cold hard facts.

by et1337

3/19/2026 at 3:42:09 PM

Unfortunately even formal specifications have this problem. Nothing can replace thinking. But sycophancy, I agree, is a problem. These tools are designed to be pleasing, to generate plausible output; but they cannot think critically about the tasks they're given.

Nothing will save you from a bad specification. And there's no royal road to knowing how to write good ones.

by agentultra

3/19/2026 at 3:53:51 PM

Right, there’s no silver bullet. I think all I can do is increase the feedback bandwidth between my brain and the real world. Regular old stuff like linters, static typing, borrow checkers, e2e tests… all the way to “talking to customers more”

by et1337

3/19/2026 at 2:29:54 PM

> Sometimes the interpolated detail is wrong

you just (correctly) negated your own claim

by abdulhaq

3/19/2026 at 4:33:39 PM

ChatGPT 5.4 pro has surprised me several times; when asking a "can such and such be done" type question, intending to have a discussion about whether a thing can be done in principle, and what it might look like, it's actually produced a working example in response, in addition to answering the questions.

Some of the missing pieces come from memory, knowing which topics I like to explore, some from the model itself, either baked in knowledge or what it picks up searching, but they can definitely take a vague, handwavy half baked idea and whip up a full app or game or whitepaper. Sometimes it's "exactly what I wanted!", other times it's "exactly the kind of thing I was talking about!"

Semantics and context and nuance are part and parcel of LLM capabilities. Superhuman in some areas, definitely subhuman in others.

AI is getting pretty competent and clever.

by observationist

3/19/2026 at 8:22:37 AM

> LLMs _can_ reliably generate (relatively small amounts of) working code from relatively terse descriptions. Code is the detail being filled in.

They can generate boilerplate, sure. Or they can expand out a known/named algorithm implementation, like pulling in a library. But neither of those is generating detail that wasn't there in the original (at most it pulls in the detail from somewhere in the training set).

by lmm

3/19/2026 at 8:53:17 AM

They do more than that. If you ask for ui with a button that button won't be upside down even if you didn't specify its orientation. Lots of the detail can be inferred from general human preferences, which are present in the LLMs' training data. This extends way beyond CS stuff like details of algorithm implementations.

by tibbe

3/19/2026 at 10:07:43 AM

Isn't "not being upsidedown" just one of the default properties of a button in whatever GUI toolkit you are using? I'd be worried if an LLM _did_ start setting all the possible button properties.

by zabzonk

3/19/2026 at 10:49:24 AM

Putting LLMs on a pedestal is very much in vogue these days.

by MoreQARespect

3/19/2026 at 12:06:09 PM

If you ask for increase and decrease buttons they will put the right icons on it (not words) and lay them out right.

by bluGill

3/19/2026 at 11:50:05 PM

I cannot confirm that to work reliably and properly from experience of asking various LLMs about reducing button size in tkinter to a minimum for a button's label.

by zelphirkalt

3/19/2026 at 9:14:39 AM

That’s exactly what they said. Details “elsewhere in its training set”.

by skywhopper

3/19/2026 at 9:13:16 AM

“LLMs _can_ reliably generate (relatively small amounts of) working code from relatively terse descriptions”

Only with well-known patterns that represent shared knowledge specified elsewhere. If the details they “fill in” each time differ in ways that change behavior, then the spec is deficient.

If we “figure out” how to write such detailed specs in the future, as you suggest, then that becomes the “code”.

by skywhopper

3/19/2026 at 11:40:53 AM

Right, when you tell it “draw me a renaissance woman” and it gives you a facsimile of the Mona Lisa, it’s not because it intelligently anticipated what you wanted — it’s just been trained thoroughly to make that association.

by ModernMech

3/19/2026 at 11:34:46 AM

Also, they're a bit more willing to make assumptions.

After awhile, I think we all get a sense of not only the amount of micro-decisions you have to make will building stuff (even when you're intimate with the domain), but also the amount of assumptions you'll need to make about things you either don't know yet or haven't fully fleshed out.

I'm painfully aware of the assumptions I'm making nowadays and that definitely changes the way I build things. And while I love these tools, their ability to not only make assumptions, but over-engineer those assumptions can have disastrous effects.

I had Claude build me a zip code heat map given a data source and it did it spectacularly. Same with a route planner. But asking it build out medical procedure documentation configurations based off of a general plan DID NOT work as well as I had expected it would.

Also, I asked Claude about what the cron expression I wrote would do, and it got it wrong (which is expected because Azure Web Jobs uses a non-standard form). But even after telling it that it was wrong, and giving it the documentation to rely on, it still doubled down on the wrong answer.

by mexicocitinluez

3/19/2026 at 1:58:35 PM

It means for most specs, you can just use an average solution, or a most-popular solution.

I'm absolutely on board with that. We probably need less weird and outlier decisions in designs for something that is a boring ass business website.

by hnthrow0287345

3/19/2026 at 10:32:13 PM

> the proof is that LLMs _can_ reliably generate (relatively small amounts of) working code from relatively terse descriptions

> Sometimes the interpolated detail is wrong (and indeterministic)

... You consider incorrect, non deterministic results to be "reliable"?

by strix_varius

3/20/2026 at 12:07:40 PM

Do you consider the implementation of such specs by another human to (always) be correct and deterministic?

Heck, if I reimplement something I worked on a month ago it’s probably not going to be the exact same. Being non deterministic needn’t to be a problem, as long as it falls within certain boundaries and produces working results.

by tinodb

3/19/2026 at 5:07:50 PM

Isn’t this no different than a compiler? I don’t specify which registers to use in C or Java, and yet the code runs!

An awful lot of time has been put into the compiler to know which registers to use and how to juggle them. Is an LLM any different in its behavior (albeit different in how it was trained)? If not, then specs are just an even higher level programming language.

I think the difference is that, with a C compiler, when it gets it wrong I’ll have some terrible performance impact, it when the LLM gets it wrong, it will do something nobody wanted, like delete someone’s account or debit one account without crediting another.

by lowbloodsugar

3/19/2026 at 4:09:29 PM

Exactly. Any developer working on any project will encounter a decision that wasn’t in the spec, where they use their judgement and taste to fill in gaps. The idea that only code can be a complete spec assumes the code perfectly matches the original intent - which we know it rarely does in a project of meaningful size.

by cush

3/19/2026 at 9:36:16 AM

In get the sense that what you are responding to and even many comments to yours are expressing a kind of coping with the current dynamic, only exacerbated by the rather elitist and egoistic mentality that people in tech have had for a very long time now; i.e., they are falling…being pushed from Mt Olympus and there is A LOT of anxious rationalization going on.

Not a mere 5 years ago even tech people were chortling down their upturned noses that people were complaining that their jobs were being “taken”, and now that the turns have tabled, there is a bunch of denial, anger, and grief going on, maybe even some depression as many of the recently unemployed realize the current state of things.

It’s all easy to deride the inferiority of AI when you’re employed in a job doing things as you had been all your career, thinking you cannot be replaced… until you find yourself on the other side of the turn that has tabled.

by roysting

3/19/2026 at 9:46:46 AM

I use AI for my work every single day - and during some weekends too. Claude Code, with Opus. It is far from being able to reliably produce the code that we need for production. It produces code that looks ok most of the time, but I have seen it lose track of key details, misinterpret requisites and even ignore them sometimes - "on purpose", as in it writing something like "let's not do that requirement, it's not necessary".

This kind of thing happens at least once per day to me, maybe more.

I am not denying that it is useful, let me be clear. It is extremely convenient, especially for mechanical tasks. It has other advantages like quick exploration of other people's code, for example. If my employer didn't provide a corporate account for me, I would pay one from my own pocket.

That said, I agree with OP and the author that it is not reliable when producing code from specs. It does things right, I would say often. That might be good enough for some fields/people. It's good enough for me, too. I however review every line it produces, because I've seen it miss, often, as well.

by otikik

3/20/2026 at 2:21:08 AM

I think we are in a bit of a trough of people trying to use methods and processes of irrelevant practices, when what is needed for a whole new dynamic, is an adapted and novel set of methods and processes. I suspect we may not get out of it for a number of years until a distinct AI-native generation can start emerging. I have had great effect and know others who have done even and far better than me, and all of them have totally reworked and revised everything about software development processes. Being able to adapt things form first principles seems to be the differentiating factor. I don't like it, but we are probably going to see a whole generation of the past software devs unable or unwilling to adapt to the revolution in the industry that is simply not going to go away.

Unfortunately we will lose things precisely because all that experience and expertise will not be captured and implemented, just like we have lost so man things from the past, like the many different proprietary and secret methods and practices that were jealously guarded by artisans, craftsmen, and artists. But now I've gotten off track a bit. Cheers.

by roysting

3/19/2026 at 3:28:57 PM

[dead]

by TheJord

3/19/2026 at 9:41:39 AM

I can't help but imagine that this is how some people felt about doctors once webmd came out.

It's some nice rhetoric, but you're not actually saying much.

by rdevilla

3/19/2026 at 12:06:43 PM

As you can see by downvotes and comments, they still don't get it.

LLMs make developers more efficient. That much is obvious to anyone who isn't blinded by fear.

But people will respond "but you still need developers!" True. You don't need nearly as many, though. In fact, with an LLM in their hands, the poor performers are more of a liability than ever. They'll be let go first.

But even the "smart" developers will be subsumed, as vastly more efficient companies outcompete the ones where they work.

Companies with slop-tolerant architectures will take over every industry. They'll have humans working there. But not many.

by jappgar

3/19/2026 at 4:13:58 PM

> LLMs make developers more efficient.

They do not. I review a ton of code, and while the quantity is going up, the quality of that code is getting worse. LLMs only make developers more efficient if they skip the due-diligence required to verify its output; they all say they don't, and almost all of them do.

by bigfishrunning

3/20/2026 at 2:20:27 PM

Probably it's not a perston you're answering to, so there no point to try to have a reasonable conversation.

by itsfine2

3/19/2026 at 5:14:02 AM

> A sufficiently detailed spec is code

This is exactly the argument in Brooks' No Silver Bullet. I still believe that it holds. However, my observation is that many people don't really need that level of details. When one prompts an AI to "write me a to-do list app", what they really mean is that "write me a to-do list app that is better that I have imagined so far", which does not really require detailed spec.

by hintymad

3/19/2026 at 5:53:32 AM

Yes. This happens because the training data contains countless SotA "to-do" apps. This argument does not scale well to other types of software.

by mfabbri77

3/19/2026 at 6:25:32 AM

Isn’t most standard software these days a permutation of things already done before?

by baxtr

3/19/2026 at 7:10:08 AM

Author here: it's not even clear that agents can reliably permute their training data (I'm not saying that it's impossible or never happens but that it's not something we can take for granted as a reliable feature of agentic coding).

As I mentioned in one of the footnotes in the post:

> People often tell me "you would get better results if you generated code in a more mainstream language rather than Haskell" to which I reply: if the agent has difficulty generating Haskell code then that suggests agents aren't capable of reliably generalizing beyond their training data.

If an agent can't consistently apply concepts learned in one language to generate code in another language, then that calls into question how good they are at reliably permuting the training dataset in the way you just suggested.

by Gabriel439

3/19/2026 at 9:13:33 AM

Your argument is far too dependent on observations made about the model's ability with Haskell, which is irrelevant. The concepts in Haskell are totally different to almost any other language - you can't easily "generalize" from an imperative strict language like basically everything people really use to a lazy pure FP language that uses monads for IO like Haskell. The underlying concepts themselves are different and Haskell has never been mainstream enough for models to get good at it.

Pick a good model, let it choose its own tools and then re-evaluate.

by mike_hearn

3/19/2026 at 5:30:23 PM

Why is Haskell irrelevant to the argument that LLMs can't reliably permute programming knowledge from one language to another? In fact, the purity of the language and dearth of training data seems like the perfect test case to see whether concepts found in more mainstream languages are actually understood.

by reedlaw

3/20/2026 at 8:53:50 AM

Because human programmers routinely fail to do that too. Haskell is an obscure language that came out of academic research. Several of the core semantics (like lazyness by default) didn't get adopted anywhere else and are only found in Haskell.

by mike_hearn

3/20/2026 at 11:29:53 AM

Then I would say this is another proof that LLMs lack intellect or ability to reason about universals. See https://michaelmangialardi.substack.com/i/186405810/test-4-p...

by reedlaw

3/19/2026 at 7:31:30 AM

> if the agent has difficulty generating Haskell code then that suggests agents aren't capable of reliably generalizing beyond their training data.

doesn't that apply to flesh-and-bone developers? ask someone who's only working in python to implement their current project in haskell and I'm not so sure you'll get very satisfying results.

by rytis

3/19/2026 at 7:55:25 AM

> doesn't that apply to flesh-and-bone developers?

No, it does not. If you have a developer that knows C++, Java, Haskell, etc. and you ask that developer to re-implement something from one language to another the result will be good. That is because a developer knows how to generalize from one language (e.g. C++) and then write something concrete in the other (e.g. Haskell).

by Frieren

3/19/2026 at 4:57:12 PM

One language in the same category to another in the same category, yes. "Category" here being something roughly like "scripting, compiled imperative, functional". However my experience is that if you want to translate to another category and the target developer has no experience in it, you can expect very bad results. C++ to Haskell is among the most pessimal such translations. You end up with the writing X in Y problem.

by jerf

3/19/2026 at 9:00:57 AM

Your argument fails where it equates someone who only codes in one language to an LLM who is usually trained in many languages.

In my experience, a software engineer knows how to program and has experience in multiple languages. Someone with that level of experience tends to pick up new languages very quickly because they can apply the same abstract concepts and algorithms.

If an LLM that has a similar (or broader) data set of languages cannot generalise to an unknown language, then it stands to reason that it is indeed only capable of reproducing what’s already in its training data.

by cassianoleal

3/19/2026 at 8:03:09 AM

The hard bit of programming has never been knowing the symbols to tell the computer what to do. It is more difficult to use a completely unknown language, sure, but the paradigms and problem solving approaches are identical and thats the actual work, not writing the correct words.

by ozlikethewizard

3/19/2026 at 8:58:50 AM

Saying that the paradigms of Python and Haskell are the same makes it sound like you don’t know either or both of those languages. They are not just syntactically different. The paradigms literally are different. Python is a high level duck typed oo scripting language and Haskell is a non-oo strongly typed functional programming language. They’re extremely far apart.

by lukevp

3/19/2026 at 2:31:41 PM

They are different, but on some fundamental level when you're writing code you're expressing an idea and it is still the same. The same way photograph and drawing of a cat are obviously different and they're made in vastly different ways, but still they're representations of a cat. It's all lambda calculus, turing machines, ram machines, combinator logic, posts' dominoes etc., etc. in the end.

by IsTom

3/19/2026 at 3:11:17 PM

They’re both turing complete, which make them equivalent.

Code is a description of a solution, which can be executed by a computer. You have the inputs and the outputs (we usually split the former into arguments amd environment, and we split the latter into side effects and return values). Python and Haskell are just different towers of abstraction built on the same computation land. The code may not be the same, but the relation between inputs and outputs does not change if they solve the same problem.

by skydhash

3/19/2026 at 5:56:30 PM

> The paradigms literally are different. […] They’re extremely far apart.

And yet, you can write pure-functional thunked streams in Python (and have the static type-checker enforce strong type checking), and high-level duck-typed OO with runtime polymorphism in Haskell.

The hardest part is getting a proper sum type going in Python, but ducktyping comes to the rescue. You can write `MyType = ConstructA | ConstructB | ConstructC` where each ConstructX type has a field like `discriminant: Literal[MyTypeDiscrim.A]`, but that's messy. (Technically, you can use the type itself as a discriminant, but that means you have to worry about subclasses; you can fix that by introducing an invariance constraint, or by forbidding subclasses….) It shouldn't be too hard to write a library to deal with this nicely, but I haven't found one. (https://pypi.org/project/pydantic-discriminated/ isn't quite the same thing, and its comparison table claims that everything else is worse.)

by wizzwizz4

3/19/2026 at 7:54:39 AM

But the model has seen pretty much all the public Haskell code around, and possibly been trained to write it in different settings.

by debugnik

3/19/2026 at 8:16:58 AM

I am very sceptical mainstream languages will be better. I have seen plenty of bad Python from LLMs. Even with simple CRUD apps and when provided with detailed instructions.

by graemep

3/19/2026 at 7:33:32 AM

"that suggests agents aren't capable of reliably generalizing beyond their training data."

Yes? If they could, we would have a strong general intelligence by now and only few people are claiming this.

by lukan

3/19/2026 at 7:26:36 AM

It can also mean that the other programming language is above the cognitive abilities of the LLM

by ChrisGreenHeur

3/19/2026 at 6:54:23 AM

But what's the point of re-building "standard software" if it is so standard that it already exists 100 times in the training data with slight variations?

by loveparade

3/19/2026 at 9:49:31 AM

See here:

https://news.ycombinator.com/item?id=47435808

by baxtr

3/19/2026 at 7:27:02 AM

The point is the small variations

by ChrisGreenHeur

3/19/2026 at 7:35:27 AM

I read this attitude very often on HN. "If someone else has already built it before, your effort is a waste of time." To me, it has this "Someone else already makes money from it, go somewhere else where you dont have competition." Well, I get the drift... But... Not everyone is into getting rich. You know, some of us just have fun building things and learning while doing so. It really doesn't matter if the path has been walked before. Not everything has to be plain novelty to count.

by lynx97

3/19/2026 at 8:02:49 AM

If you do it for fun then why do you care whether an LLM can do it well or not, which was the original argument? Shouldn't matter to you in that case.

by loveparade

3/19/2026 at 6:44:44 AM

I'd say that's pretty much the definition of standard, yeah. And it's why you can't make a profit selling a simple ToDo app. If you expect people to pay for what you build, you have to build something that doesn't have a thousand free clones on the app store.

by roarcher

3/19/2026 at 6:51:08 AM

I politely disagree.

I think you’re conflating software and product.

A product can be a recombination of standard software components and yet be something completely new.

by baxtr

3/19/2026 at 11:16:10 PM

There's this weird schizophrenia around this in the software world.

In the past I have had people here suggest they're just writing boilerplate CRUD software, and I've suggested that means they could just use low code tools instead. They then suggest it's too complex for that to work.

I think we tend to view ourselves as just hooking together basic operations, which might be technically true, but that becomes complex very quickly. A product can be built off of straight forward REST and database operations, but take you months of learning to get up to speed on.

by nitwit005

3/19/2026 at 6:56:11 AM

That isn’t saying much. Every software is a permutation of zeros and ones. The novelty or ingenuity, or just quality and fitness for purpose, can lie in the permutation you come up with. And an LLM is limited by its training in the permutations it is likely to come up with, unless you give it heaps of specific guidance on what to do.

by layer8

3/19/2026 at 6:36:52 AM

In my experience, the further you move away from the user and toward the hardware and fundamental theoretical algorithms, the less true this becomes.

This is very true for an email client, but very untrue for an innovative 3D rendering engine technology (just an example).

by mfabbri77

3/19/2026 at 6:53:55 AM

An email client is highly nontrivial, due to the complexities of the underlying standards, and how the real implementations you have to be compatible with don’t strictly follow them. Making an email client that doesn’t suck and is fully interoperable is quite an ambitious endeavor.

by layer8

3/19/2026 at 7:15:04 AM

The point was to answer the question: "Can every piece of software be viewed as a permutation of software that has already been developed?" In my opinion, an email client is a more favorable example than a 3D engine. In fields where it is necessary to differentiate, improve, or innovate at the algorithmic level, where research and development play a fundamental role, it is not simply a matter of permuting software or leveraging existing software components by simply assembling them more effectively.

by mfabbri77

3/19/2026 at 7:49:02 AM

Actually, in the specific case of a 3D program it's the current generation of LLM's complete lack of ability in spatial reasoning that prevents them from "understanding" what you want when you ask it to e.g. "make a camera that flies in the direction you are looking at".

It necessarily has to derive it from examples of cameras that fly forward that it knows about, without understanding the exact mathematical underpinnings that allow you to rotate a 3D perspective camera and move along its local coordinate system, let alone knowing how to verify whether its implementation functions as desired, often resulting in dysfunctional garbage. Even with a human in the loop that provides it with feedback and grounds it (I tried), it can't figure this out, and that's just a tiny example.

Math is precise, and an LLM's fuzzy approach is therefore a bad fit for it. It will need an obscene amount of examples to reliably "parrot" mathematical constructs.

by Archer6621

3/19/2026 at 7:59:33 AM

> "make a camera that flies in the direction you are looking at"

That's not the task of a renderer though, but its client, so you're talking past your parent comment. And given that I've seen peers one-shot tiny Unity prototypes with agents, I don't really believe they're that bad at taking an educated guess at such a simple prompt, as much as I wish it were true.

by debugnik

3/19/2026 at 8:20:05 AM

You're right. My point was more that LLMs are bad at (3D) math and spatial reasoning, which applies to renderers. Since Unity neatly abstracts the complexity away of this through an API that corresponds well to spoken language, and is quite popular, that same example and similar prototypes should have a higher success rate.

I guess the less detailed a spec has to be thanks to the tooling, the more likely it is that the LLM will come up with something usable. But it's unclear to me whether that is because of more examples existing due to higher user adoption, or because of fewer decisions/predictions having to be made by the LLM. Maybe it is a bit of both.

by Archer6621

3/19/2026 at 10:21:56 AM

What complexities specifically? Implementing SMTP (from the client’s perspective) that other SMTP servers can understand is not very hard. I have done it. Does it follow every nuance of the standard? I don’t know, but it works for me. I haven’t implemented IMAP but I don’t see why it should be much harder. Is there a particular example you have in mind?

by umanwizard

3/19/2026 at 7:00:03 AM

I would be surprised if there are more working email clients out there than working 3D engines. The gaming market is huge, most people do not pay to use email, hobbyists love creating game engines.

by fmbb

3/19/2026 at 7:09:28 AM

Idk, a working basic email client is just not that hard to write though. SMTP and IMAP are simple protocols and the required graphical interface is a very straightforward combination of standard widgets.

by umanwizard

3/19/2026 at 6:21:58 PM

I don't mean to be contrarian, but this is completely false.

IMAP _seems_ to be a straightforward (but nasty and stateful) protocol, until you find out that every major provider ignores RFCs and does things slightly differently.

It's a hellscape.

by isaachinman

3/19/2026 at 6:34:25 AM

Most software written today (or 10 years ago, or 50 years ago) is not particularly unique. And even in that software that is unusual you usually find a lot of run-of-the-mill code for the more mundane aspects

by wongarsu

3/19/2026 at 7:21:47 AM

I don’t think this is true. I’ve been doing this since the 1980s and while you might think code is fairly generic, most people aren’t shipping apps they’re working on quiet little departmental systems, or trying to patch ancient banking systems and getting a greenfield gig is pretty rare in my experience.

So for me the code is mundane but it’s always unique and rarely do you come across the same problems at different organisations.

If you ever got a spec good enough to be the code, I’m sure Claude or whatever could absolutely ace it, but the spec is never good enough. You never get the context of where your code will run, who will deploy it or what the rollback plan is if it fails.

The code isn’t the problem and never was. The problem is the environment where your code is going.

The proof is bit rot. Your code might have been right 5 years ago but isn’t any more because the world shifted around it.

I am using Claude pretty heavily but there are some problems it is awful at, e.g I had a crusty old classic ASP website to resuscitate this week and it would not start. Claude suggested all the things I half remembered from back in the day but the real reason was Microsoft disabled vbscript in windows 11 24H2 but that wasn’t even on its radar.

I have to remind myself that it’s a fancy xerox machine because it does a damn good job of pretending otherwise.

by smackeyacky

3/19/2026 at 6:40:14 AM

Most of the economically valuable software written is pretty unique, or at least is one of few competitors in a new and growing niche. This is because software that is not particularly unique is by definition a commodity, with few differentiators. Commodity software gets its margins competed away, because if you try to price high, everybody just uses a competitor.

So goes the AI paradox: it's really effective at writing lots and lots of software that is low value and probably never needed to get written anyway. But at least right now (this is changing rapidly), executives are very willing to hire lots of coders to write software that is low value and probably doesn't need to be written, and VCs are willing to fund lots of startups to automate the writing of lots of software that is low value and probably doesn't need to be written.

by nostrademons

3/19/2026 at 6:51:26 AM

Could you give some examples? I can only imagine completely proprietary technology like trading or developing medicine. I have worked in software for many years and was always paid well for it. None of it was particularly unique in any way. Some of it better than others, but if you could show that there exists software people pay well for that AI cannot make I would be really impressed. With my limited view as software engineer it seems to me that the data in the product / its users is what makes it valuable. For example Google Maps, Twitter, AirBnB or HN.

by philipp-gayret

3/19/2026 at 7:20:20 AM

Were you around when any of Google Maps, Twitter, AirBnB, or HN were first released? Aside from AirBnB (whose primary innovation was the business model, and hitting the market right during the global financial crisis when lots of families needed extra cash), they were each architecturally quite different from software that had come before.

Before Google Maps nobody had ever pushed a pure-Javascript AJAX app quite so far; it came out just as AJAX was coined, when user expectations were that any major update to the page required a full page refresh. Indeed, that's exactly what competitor MapQuest did: you had to click the buttons on the compass rose to move the map, it moved one step at a time, and it fully reloaded the page with each move. Google Maps's approach, where you could just drag the map and it loaded the new tiles in the background offscreen, then positioned and cropped everything with Javascript, was revolutionary. Then add that it gained full satellite imagery soon after launch, which people didn't know existed in a consumer app.

Twitter's big innovation was the integration of SMS and a webapp. It was the first microblog, where the idea was that you could post to your publicly-available timeline just by sending an SMS message. This was in the days before Twilio, where there was no easy API for sending these, you had to interface with each carrier directly. It also faced a lot of challenges around the massive fan-out of messages; indeed, the joke was that Twitter was down more than it was up because they were always hitting scaling limits.

HN has (had?) an idiosyncratic architecture where it stores everything in RAM and then checkpoints it out to disk for persistence. No database, no distribution, everything was in one process. It was also written in a custom dialect of Lisp (Arc) that was very macro-heavy. The advantage of this was that it could easily crank out and experiment with new features and new views on the data. The other interesting thing about it was its application of ML to content moderation, and particularly its willingness to kill threads and shadowban users based on purely algorithmic processes.

by nostrademons

3/19/2026 at 7:30:35 AM

All it takes is a sufficiently big pile of custom features interacting. I work on a legal tech product that automates documents. Coincidentally, I'm just wrapping up a rewrite of the "engine" that evaluates how the documents will come out. The rewrite took many months, the code uses graph algorithms and contains a huge amount of both domain knowledge and specific product knowledge.

Claude Code is having the hardest time making sense of it and not breaking everything every step of the way. It always wants to simplify, handwave, "if we just" and "let's just skip if null", it has zero respect for the amount of knowledge and nuance in the product. (Yes, I do have extensive documentation and my prompts are detailed and rarely shorter than 3 paragraphs.)

by Toutouxc

3/19/2026 at 3:21:05 PM

Everything I’ve ever worked on has been entirely greenfield in domains that had very limited prior development. Industrial applications of software (and data science).

Google Maps, Twitter, and AirBnb occupy a tiny fragment of the possible domain applications of software.

by datsci_est_2015

3/19/2026 at 7:14:01 AM

You know how whenever you shuffle a deck of cards you almost certainly create an order that has never existed before in the universe?

Most software does something similar. Individual components are pretty simple and well understood, but as you scale your product beyond the simple use cases ("TODO apps"), the interactions between these components create novel challenges. This applies to both functional and non-functional aspects.

So if "cannot make with AI" means "the algorithms involved are so novel that AI literally couldn't write one line of them", then no - there isn't a lot of commercial software like that. But that doesn't mean most software systems aren't novel.

by krethh

3/19/2026 at 7:10:24 AM

Agencies have switched to SaaS products and integrations via serverless or low code tooling, exactly because there is already too much of the same.

by pjmlp

3/19/2026 at 5:45:15 AM

> When one prompts an AI to "write me a to-do list app", what they really mean is that "write me a to-do list app that is better that I have imagined so far", which does not really require detailed spec.

If someone was making a serious request for a to-do list app, they presumably want it to do something different from or better than the dozens of to-do list apps that are already out there. Which would require them to somehow explain what that something was, assuming it's even possible.

by lmm

3/19/2026 at 6:16:35 AM

It could be an issue of discoverability too. Maybe they just haven't found the to-do app that does what they want, and it's easier to just... make one from scratch.

by ms_menardi

3/19/2026 at 7:10:52 AM

Which is not getting better.

I'd pay you 10€ for a TODO app that improved my life meaningfully. It would obviously need to have great UX and be stable. Those are table stakes.

I don't have the time to look at all these apps though. If somebody tells me they made a great TODO app, I'm already mentally filtering them out. There's just too much noise here.

Does your TODO app solve any meaningful problem beyond the bare minimum? Does it solve your procrastination? Does it remind you at the right time?

If it doesn't answer this in the first 2 seconds of your pitch you're out.

by carlmr

3/19/2026 at 1:50:36 PM

> It would obviously need to have great UX

There is the problem. Todo apps are easy to make. However making one that is actually useful for tracking todo items it hard. Getting the todo into the app is harder than writing it on paper. Getting reminders at a useful time is hard (now is not a great time to fix that broken widget - it needs parts not in the budget, I'm at work, it needs a couple hours of dedicated time and I have something else coming up...). I've tried a few different ones, most are a combination of too complex and not complex enough at the same time.

by bluGill

3/19/2026 at 7:55:17 PM

Exactly, you would probably also pay for a good one. I tried maybe 10 and gave up. I now carry a small booklet where you can rip off the pages, for notes and todos. Still way better than the phone apps

by carlmr

3/19/2026 at 7:57:42 AM

Would a musician refrain from writing a love song because there are already better love songs?

by pixelbart

3/19/2026 at 8:18:23 AM

> Would a musician refrain from writing a love song because there are already better love songs?

Yes; at least, I would hope a musician who was writing a love song was doing so because they want it to do something different from or better than other existing love songs. (Or they might be doing it to practice their songwriting skills - just as a programmer might write a todo app to practice their programming skills - but it makes no sense to use an AI for that)

by lmm

3/19/2026 at 7:51:25 AM

Not entirely.

For some problems, it is. Web front-end development, for example. If you specify what everything has to look like and what it does, that's close to code.

But there are classes of problems where the thing is easy to specify, but hard to do correctly, or fast, or reliably. Much low-level software is like that. Databases, file systems, even operating system kernels. Networking up to the transport layer. Garbage collection. Eventually-consistent systems. Parallel computation getting the same answer as serial computation. Those problems yield, with difficulty, to machine checked formalism.

In those areas, systems where AI components struggle to get code that will pass machine-checked proofs have potential.

by Animats

3/19/2026 at 5:42:13 AM

But if you’re selling that to-do list app, then the rules are different, and that spec is required.

I guess it depends on whether or not we want to make money, or otherwise, compete against others.

by ChrisMarshallNY

3/19/2026 at 4:15:34 PM

What does "better" mean here? i suspect that "better" would be defined...in a spec

by bigfishrunning

3/19/2026 at 6:35:38 AM

Everyone at least heard stories of people who just want that button 5px to the right or to the left and next meeting they want it in bottom corner - whereas it doesn’t make functionally any difference.

But that’s most of the time is not that they want it from objective technical reasons.

They want it because they want to see if they can push you. They do it „because they can”. They do it because later they can renegotiate or just nag and maybe pay less. Multiple reasons that are not technical.

by ozim

3/19/2026 at 12:19:18 PM

And that’s why you make your code modular and impervious to such change. And I don’t mind if they want to spend their money by making me do stuff that’s not important.

by skydhash

3/19/2026 at 8:35:44 AM

I wouldn‘t say this is the core argument of No Silver Bullet. I wrote a short review of Brooks paper with respect to todays AI promises, to whoever is interested in more details:

https://smartmic.bearblog.dev/no-ai-silver-bullet/

by smartmic

3/19/2026 at 7:35:09 AM

In this case a chatbot is also unlikely to succeed in pleasing the user—and how could it?

by throwaway27448

3/19/2026 at 11:23:57 AM

Indeed, because how could that LLM know what I didn't like about all other to-do apps? It will generate a new one more or less the same as the old one, but the user will be happy because "yay it's mine". Maybe. Or maybe not.

by soco

3/19/2026 at 9:19:03 AM

[dead]

by jamiemallers

3/19/2026 at 8:12:39 AM

[dead]

by quangtrn

3/19/2026 at 8:07:03 AM

The vibe coding maximalist position can be stated in information theory terms: That there exists a decoder that can decode the space of useful programs from a much smaller prompt space.

The compression ratio is the vibe coding gain.

I think that way of phrasing it makes it easier to think about boundaries of vibe coding.

"A class that represents (A) concept, using the (B) data structure and (C) algorithms for methods (D), in programming language (E)."

That's decodeable, at least to a narrow enough distribution.

"A commercially successful team communication app built around the concept of channels, like in IRC."

Without already knowing Slack, that's not decodable.

Thinking about what is missing is very helpful. Obviously, the business strategic positioning, non technical stakeholder inputs, UX design.

But I think it goes beyond that: In sufficiently complex apps, even purely technical "software engineering" decisions are to some degree learnt from experiment.

This also makes it more clear how to use AI coding effectively:

* Prompt in increments of components that can be encoded in a short prompt.

* If possible, add pre-existing information to the prompt (documentation, prior attempts at implementation).

by svara

3/19/2026 at 1:27:37 PM

What you describe is more or less exactly algorithmic information theory. From https://en.wikipedia.org/wiki/Algorithmic_information_theory:

"Informally, from the point of view of algorithmic information theory, the information content of a string is equivalent to the length of the most-compressed possible self-contained representation of that string. A self-contained representation is essentially a program—in some fixed but otherwise irrelevant universal programming language—that, when run, outputs the original string."

Where it gets tricky is the "self-contained" bit. It's only true with the model weights as a code book, e.g. to allow the LLM to "know about" Slack.

by PessimalDecimal

3/19/2026 at 4:44:58 PM

> That there exists a decoder that can decode the space of useful programs from a much smaller prompt space.

I love this. I've been circling this idea for a while and you put into words what I've struggled to describe.

> "A commercially successful team communication app built around the concept of channels, like in IRC." > Without already knowing Slack, that's not decodable.

I would like to suggest that implicit shared context matters here. Or rather, humans tend to assume more shared context than LLM's actually have, and that misleads us when it comes assessing the aforementioned decoder.

But I think it also suggests that there is a system that could be built with strong constraints and saliency that could really explode the compression ratio of vibe coding.

by jmcqk6

3/19/2026 at 11:40:45 AM

It's not necessarily just the terseness. Terseness might be a selling point for people who have already invested in training themselves to be fluent with programming languages and the associated ecosystem of tooling.

But there is an entire cohort of people who can think about specifying systems but lack the training to sdo so so using the current methods and see a lower barrier to entry in the natural language.

That doesn't mean the LLM is going to think on your behalf (although there is also a little bit of that involved and that's where stuff gets confusing) but it surely provides a completely different interface for turning your ideas into working machinery

by ithkuil

3/19/2026 at 1:23:47 PM

"[T]here is an entire cohort of people who can think about specifying systems but lack the training to sdo so so using the current methods and see a lower barrier to entry in the natural language."

"Specifying" is the load-bearing term there. They are describing what they want to some degree, how how specifically?

by PessimalDecimal

3/19/2026 at 4:22:07 PM

> But there is an entire cohort of people who can think about specifying systems but lack the training to sdo so so using the current methods

Nah, it will be extremely surprising if even 1 such a person exists.

On the other hand, there are lots of people that can write code, but still can't specify a system. In fact, if you keep increasing the size of the system, you will eventually fit every single programmer in that category.

by marcosdumay

3/19/2026 at 4:10:38 AM

I think it's only a matter of time before people start trying to optimize model performance and token usage by creating their own more technical dialect of English (LLMSpeak or something). It will reduce both ambiguity and token usage by using a highly compressed vocabulary, where very precise concepts are packed into single words (monads are just monoids in the category of endofunctors, what's the problem?). Grammatically, expect things like the Oxford comma to emerge that reduce ambiguity and rounds of back-and-forth clarification with the agent.

The uninitiated can continue trying to clumsily refer to the same concepts, but with 100x the tokens, as they lack the same level of precision in their prompting. Anyone wanting to maximize their LLM productivity will start speaking in this unambiguous, highly information-dense dialect that optimizes their token usage and LLM spend...

by rdevilla

3/19/2026 at 7:34:02 AM

Have you just reinvented programming languages and reinforced the author's point?

Setting aside the problem of training, why bother prompting if you’re going to specify things so tightly that it resembles code?

by grey-area

3/19/2026 at 9:17:56 AM

Programming languages admit only unambiguous text. What he's proposing is more like EARS, Gherkin or Planguage.

by mike_hearn

3/19/2026 at 9:23:52 AM

Not necessarily. I was intending it as a thought experiment illustrating why some kind of formal language (whether that mean technical jargon, unambiguous syntax, unambiguous semantics, conlangs, specification languages, or some combination thereof) will eventually arise from natural language - as it has countless times in the past, within mathematics (as referenced in TFA) and elsewhere. Gherkin is kind of nice though.

by rdevilla

3/19/2026 at 9:21:26 AM

[dead]

by rdevilla

3/19/2026 at 5:21:23 AM

Unless you're training your own model, wouldn't you have to send this dialect in your context all the time? Since the model is trained on all the human language text of the internet, not on your specialized one? At which point you need to use human language to define it anyway? So perhaps you could express certain things with less ambiguity once you define that, but it seems like your token usage will have to carry around that spec.

by majormajor

3/19/2026 at 4:25:11 AM

Let's use a non-ambiguous language for this. May I suggest Lojban [1][2]?

[1] https://en.wikipedia.org/wiki/Lojban

[2] Someone speaking it: https://www.youtube.com/watch?v=lxQjwbUiM9w

by nomel

3/19/2026 at 12:04:39 PM

Lojban allows you to be vastly more semantically ambiguous than English while still speaking technically correctly. A single predicate-word (“gismu”) is a valid utterance. For example, saying “tanxe” is so vague or context-dependent enough as to be hard to translate: “something unspecified is a box, at an unspecified time or place, and it may or may not even exist”. Language will not save you. Or if it will, we already have them in the form of programming languages.

by dwb

3/19/2026 at 9:19:25 AM

Lojban allows you to speak ambiguously, it just disallows grammatical ambiguity because in the 70s it was hypothesized that NLP understanding was impossible so humans would have to adapt instead of computers. That debate is over; understanding grammar is solved. The new debate is over semantic ambiguity.

by mike_hearn

3/19/2026 at 5:13:09 AM

It looks like that's about syntactic ambiguity, whereas the parent is talking semantic ambiguity

by dooglius

3/19/2026 at 7:32:25 AM

Human language is already very efficient for conveying the ideas we have. Some languages are more efficient at conveying certain concepts, but all are able to handle the 90% case. I'd expect any attempts to build a "technical dialect of English" to go about as well as Esperanto.

by kstenerud

3/19/2026 at 7:47:34 AM

We already speak in a "technical dialect of English". All we need is some jargon to talk about technical things. (Lawyers have their own jargon too, also chemists, etc)

Some languages don't have this kind of vocabulary, because there aren't enough speakers that deal with technical things in a given area (and those that do, use another language to communicate)

by nextaccountic

3/19/2026 at 2:05:43 PM

For me the proof that we do indeed have and use a technical dialect of English in this field lies in the simple observation that no matter how much praise I get at work for how good my English, that doesn't map at all with my ability (or lack thereof) to converse fluently with the random Joe on the street of an English speaking country

by ithkuil

3/19/2026 at 6:17:43 AM

The thing is, doesn't the LLM need to be trained on this dialect, and if the training material we have is mostly ambiguous, how do we disambiguate it for the purpose of training?

In my mind this is solving different problems. We want it to parse out our intent from ambiguous semantics because that's how humans actually think and speak. The ones who think they don't are simply unaware of themselves.

If we create this terse and unambiguous language for LLMs, it seems likely to me that they would benefit most from using it with each other, not with humans. Further, they already kind of do this with programming languages which are, more or less, terse and unambiguous expression engines for working with computers. How would we meaningfully improve on this, with enough training data to do so?

I'm asking sincerely and not rhetorically because I'm under no illusion that I understand this or know any better.

by steve_adams_86

3/19/2026 at 5:30:35 AM

Codex already has such a language. The specs it’s been writing for me are full of “dedupe”, “catch-up”, and I often need to feedback that it should use more verbose language. Some of that has been creeping into my lingo already. A colleague of mine suddenly says the word “today” all the time, and I suspect that’s because he uses Claude a lot. Today, as in, current state of the code.

by manmal

3/19/2026 at 5:30:37 AM

It was mentioned somewhere else on hn today, but why do I care about token usage? I prompt AI day and night for coding and other stuff via claude code max 200 and mistral; haven't had issues for many months now.

by anonzzzies

3/19/2026 at 6:15:16 AM

it’s a measure of efficiency. one might not care about tokens until vendors jack up the price and running your own comparable model is infeasible.

by sda2

3/19/2026 at 4:57:20 PM

You may not but many people do. My boss routinely runs over his quota of Claude max.

by abustamam

3/19/2026 at 4:58:21 AM

> by creating their own more technical dialect of English

Ah, the Lisp curse. Here we go again.

coincidently, the 80s AI bubble crashed partly because Lisp dialetcts aren't inter-changable.

by est

3/19/2026 at 5:19:17 AM

Lisp doesn't get to claim all bad accidental programming languages are simply failing to be it, I don't care how cute that one quote is.

by Dylan16807

3/19/2026 at 5:15:25 AM

I bet a modern LLM could inter-change them pretty easily.

by reverius42

3/19/2026 at 5:23:04 AM

trained on public data, yes.

But some random in-house DSL? Doubt it.

by est

3/19/2026 at 7:04:12 AM

and then someone will como along and say "wouldn'tt it be nice if this highly specific dialect was standardized?" goto 1

by vrighter

3/19/2026 at 5:02:06 AM

> optimizes their token usage and LLM spend

Context pollution is a bigger problem.

E.g., those SKILL.md files that are tens of kilobytes long, as if being exhaustively verbose and rambling will somehow make the LLM smarter. (It won't, it will just dilute the context with irrelevant stuff.)

by otabdeveloper4

3/19/2026 at 9:25:19 PM

[dead]

by mireya_o

3/19/2026 at 3:37:21 PM

I do this already, it's called Python.

by globular-toast

3/19/2026 at 5:17:08 AM

[dead]

by dwd

3/19/2026 at 7:04:36 AM

[dead]

by sjeiuhvdiidi

3/19/2026 at 4:52:08 AM

[dead]

by capt-obvious

3/19/2026 at 6:30:19 AM

Or they could look at the past few centuries of language theory and start crafting better tokenizers with inductive biases.

We literally have proof that an iron age ontology of meaning as represented in Chinese characters is 40% more efficient than naive statistical analysis over a semi phonetic language and we still are acting like more compute will solve all our problems.

by noosphr

3/19/2026 at 6:53:29 AM

> We literally have proof that an iron age ontology of meaning as represented in Chinese characters is 40% more efficient than naive statistical analysis over a semi phonetic language

Can you elaborate? I think you're talking about https://github.com/PastaPastaPasta/llm-chinese-english , but I read those findings as far more nuanced and ambiguous than what you seem to be claiming here.

by retsibsi

3/19/2026 at 7:12:33 AM

> We literally have proof that an iron age ontology of meaning as represented in Chinese characters is 40% more efficient than naive statistical analysis over a semi phonetic language and we still are acting like more compute will solve all our problems.

Post a link because until you do, I’m almost certain this is pseudoscientific crankery.

Chinese characters are not an “iron age ontology of meaning” nor anything close to that.

Also please cite the specific results in centuries-old “language theory” that you’re referring to. Did Saussure have something to say about LLMs? Or someone even older?

by umanwizard

3/19/2026 at 8:11:28 AM

A spec is an envelope that contains all programs that comply. Creating this spec is often going to be harder than writing a single compliant program.

Since every invocation of an LLM may create a different program, just like people, we will see that the spec will leave much room for good and bad implementations, and highlight the imprecision in the spec.

Once we start using a particular implementation it often becomes the spec for subsequent versions, because it's interfaces expose surface texture that other programs and people will begin to rely on.

I'm not sure how well LLMs will fare are brownfield software development. There is no longer a clean specification. Regenerating the code from scratch isn't acceptable. You need TPS reports.

by angry_octet

3/19/2026 at 10:52:44 AM

> A spec is an envelope that contains all programs that comply. Creating this spec is often going to be harder than writing a single compliant program.

This perfectly explains the feeling I had when, 20 years into my career, I had to start writing specs. I could never quite put my finger on why it was harder than coding. My greater familiarity with coding didn't seem a sufficient explanation.

When writing a line of spec, I had to consider how it might be interpreted in the context of different implementations of other lines of spec - combinatorial nightmare.

But code is just a spec as far as, say, a C compiler is concerned. The compiler is free to implement the assembly however it likes. Writing that spec is definitely easier than writing the assembly (Fred Brookes said this, so it must be true).

So why the difference?

by abainbridge

3/19/2026 at 12:42:32 PM

C has a simpler mapping to assembly than most languages, so you are doing most of the high level translation when writing C. But even C compilers have considerable scope for weirdness, hence projects like CompCert.

But much of the code we run today is JIT executed, and that leaves ample room for exploiting with weird machines. Eg the TOCTOU in the Corina exploit.

Even at this very low level, full coverage specs require years of careful formal methods work. We have no hope of doing it at for vibe coding, everything will be iterative, and if TDD helps then good, but specs are by no means easier than code.

by angry_octet

3/19/2026 at 12:39:16 PM

> But code is just a spec as far as, say, a C compiler is concerned. The compiler is free to implement the assembly however it likes.

Not at all. Code is formal, and going from C to assembly os deterministic. While the rules may be complex, they are still rules, and the compiler can’t stray away from them.

Writing C code is easier than writing assembly because it’s an easier notation for thinking. It’s like when hammering a nail. Doing it with a rock is hard. But using a hammer is better. You’re still hitting the nail with a hard surface, but the handle, which is more suitable for a human hand, makes the process easier.

So programming languages are not about the machine performance, they are about easier reasoning. And good programmers can get you to a level above that with proper abstraction and architecture, and give you concepts that directly map to the problem space (Gui frameworks instead of directly rendering to a framebuffer, OS syscall instead of messing with hardware,…).

by skydhash

3/19/2026 at 4:12:29 PM

> Code is formal, and going from C to assembly is deterministic.

OK, this is the main thing. Going from C to assembly is not deterministic in a sense because different compilers can produce different output. But the behaviour of the generated assembly is always the same. This isn't true of a spec.

by abainbridge

3/19/2026 at 4:49:26 PM

> Going from C to assembly is not deterministic in a sense because different compilers can produce different output.

That’s a weird definition of determinism.

by skydhash

3/19/2026 at 6:16:32 PM

Why is this a weird definition of determinism? Could you please define what you mean when you say deterministic?

A C program does not identify a single assembly program. It identifies a set of assembly programs. This fits the pretty standard definition of non-determinism.

A difference between natural language and C code is that natural language does not have a formal semantics. Having no formal semantics is a very different problem from having a semantics that admits a well-defined set of interpretations.

by derrak

3/19/2026 at 6:47:22 PM

Determinism implies that the same input will result in the same output.

I agree with you that for a single C program there’s a set of assembly code that satisfies it. But by choosing a compiler, an architecture and a set of flags,… you will always get the same assembly code. If you decide to randomize them, then you can no longer guarantee a specific result, but you can still guarantee sets of result. Which is the definition of non-determinism.

Formalism is orthogonal as its about having well defined sets and transformation. LLMs are formal because it’s a finite set of weights and tokens ad the operations are well defined. But the prompt -> tokens -> tokens -> code transformation is non- deterministic in most tools (claude, chatgpt). And the relation between the input and the output os a mathematical one, not a semantic one.

by skydhash

3/19/2026 at 7:14:47 PM

I see. Then we’re on the same page. My follow up question is: why do we care if the LLM is deterministic?

Hypothetically, if we could guarantee a semantic relationship between the input and output we wouldn’t care if the LLM was deterministic. For instance, if I give the LLM a lean theorem and it instantiates a program and a mechanical proof that the program conforms to the lean theorem, I just don’t care about determinism. Edit: this is equivalent to me not caring very much about which particular conformant C compiler I pick

And my understanding of LLMs is that they actually are functions and the observed randomness is an artifact of how we use them. If you had the weights and the hardware, you could run the frontier models deterministically. But I don’t think you’d be satisfied even if you could do that. Edit: this is maybe analogous to picking a particular C compiler that does not promise conformance

There are valid concerns with LLMs but I’m not convinced non-determinism is the thing we should care about.

by derrak

3/19/2026 at 11:45:38 PM

Non-determinism is not what people really care about.

If you remove randomness (temperature) from an LLM, It's going to be deterministic. But the relation between inputs and outputs is inscrutable (too many parameters) and there's no practical way prove the relation between a certain prompt and the output unless you run it.

Then you add randomness on top of that and the whole thing is a chaotic mess. Due to being formal, I believe generated code has a high probability of being correct (syntax) and generic patterns can be replicated easily. But the higher level concerns (the domain) and the nebulous concepts like maintainability, security,... is harder to replicate. Also correctness (logic) is hard to prove as you're unfamiliar with the code.

by skydhash

3/20/2026 at 2:17:28 AM

So do you disagree with the parent that “code is just a spec as far as the C compiler is concerned”?

Maybe it’s important to agree on what a spec is. For instance, do you agree that a spec can be just as formal as code implementing the spec? For instance, if the spec is written in a machine parseable logic do you take that to be a spec? Or are you taking a spec to be written in natural language?

I suspect some of your disagreement with the parent is about this definition. I’m just trying to understand because for me it’s uncontroversial to claim C code is a spec for assembly code, but maybe the issue is that I am not thinking of specs in the colloquial way.

by derrak

3/19/2026 at 4:39:01 PM

+1 to envelope framing. The part that keeps nagging me is that two programs can both satisfy the same spec and have completely different security properties. "Store user credentials" has a massive envelope -- from bcrypt-with-salt to plaintext-in-a-cookie. When a human implements the spec, they bring years of "obviously don't do that" instincts. Agents don't have that filter unless you spec it explicitly, which means your security posture is now a function of how exhaustive your spec is. In practice nobody specs the negative space. You'd need to enumerate what NOT to do, which grows without bound.

by riteshkew1001

3/19/2026 at 6:27:46 PM

> two programs can both satisfy the same spec and have completely different

The spec should express all relevant constraints. If your spec admits two things and only one is admissible in your mind, your spec is incomplete.

> has a massive envelope

The size of the envelope is less relevant than the expressivity of the language used to express subsets of that envelope. But almost always there is some logic which is more succinct for expressing the spec than the programming language used to express the implementation.

> your security posture is now a function of how exhaustive your spec is

The alternative is that your security posture is a function of unstated intentions living in somebody’s brain. This alternative seems strictly worse.

> You'd need to enumerate what NOT to do

This is equivalent to declaring what you must do and usually there is a succinct way to do this that does not involve listing every negative example

by derrak

3/19/2026 at 5:46:15 PM

> Creating this spec is often going to be harder than writing a single compliant program.

I’m not sure if I agree. A good spec will specify what the program is supposed to do rather than how the program should do that thing. It should be easier to specify what rather than how. Moreover, it should be standard practice to express what before you start writing how.

I see some people pushing back on this by saying two programs that satisfy the same spec might have different performance or security properties. That is correct and if you care about those things, you should specify them. Writing down these properties, e.g., “the program is O(n) where n is blah” should be much easier than implementing a non-trivial linear time algorithm founded in deep ideas.

by derrak

3/19/2026 at 8:08:25 AM

In my experience with “agentic engineering” the spec docs are often longer than the code itself.

Natural language is imperfect, code is exact.

The goal of specs is largely to maintain desired functionality over many iterations, something that pure code handles poorly.

I’ve tried inline comments, tests, etc. but what works best is waterfall-style design docs that act as a second source of truth to the running code.

Using this approach, I’ve been able to seamlessly iterate on “fully vibecoded” projects, refactor existing codebases, transform repositories from one language to another, etc.

Obviously ymmv, but it feels like we’re back in the 70s-80s in terms of dev flow.

by jumploops

3/19/2026 at 8:34:23 AM

> The goal of specs is largely to maintain desired functionality over many iterations, something that pure code handles poorly.

IMHO this could be achieved with large set of tests, but the problem is if you prompt an agent to fix tests, you can't be sure it won't "fix the test". Or implement something just to make the test pass without looking at a larger picture.

by stanac

3/19/2026 at 5:01:39 PM

I find myself babysitting agent-derived tests unless I specifically say what the in variants and edge cases are. Sometimes I'll ask it if I missed anything and it'll be helpful. But I have to be proactive.

by abustamam

3/19/2026 at 8:19:35 AM

> In my experience with “agentic engineering” the spec docs should be longer than the code itself. Natural language is imperfect, code is exact.

The latter notion probably is true, but the prior isn’t necessarily true because you can map natural language to strict schemas. ”Implement an interface for TCP in <language>” is probably shorter than the actual implementation in code.

And I understand my example is pedantic, but it extends to any unambiguous definitions. Of course one can argue that TCP spec is not determimistic by nature because natural language isn’t. But that is not very practical. We have to agree to trust some axioms for compilers to work in the first place.

by yes_man

3/19/2026 at 8:43:22 AM

Thanks, I updated my comment to say “are often longer” because that’s what I see in practice.

To your point, there are some cases where a short description is sufficient and may have equal or less lines than code (frequently with helper functions utilizing well known packages).

In either case, we’re entering a new era of “compilers” (transpilers?), where they aren’t always correct/performant yet, but the change in tides is clear.

by jumploops

3/19/2026 at 10:17:07 AM

Trying to go the Spec -> LLM route is just a lost cause. And seems wasteful to me even if it worked.

LLM -> Spec is easier, especially with good tools that can communicate why the spec fails to validate/compile back to the LLM. Better languages that can codify things like what can actually be called at a certain part of the codebase, or describe highly detailed constraints on the data model, are just going to win out long term because models don't get tired trying to figure this stuff out and put the lego bricks in the right place to make the code work, and developers don't have to worry about UB or nasty bugs sneaking in at the edges.

With a good 'compilable spec' and documentation in/around it, the next LLM run can have an easier time figuring out what is going on.

Trying to create 'validated english' is just injecting a ton of complexity away from the area you are trying to get actual work done: the code that actually runs and does stuff.

by sornaensis

3/19/2026 at 6:09:01 AM

The cognitive dissonance comes from the tension between the-spec-as-management-artifact vs the-spec-as-engineering-artifact. Author is right that advocates are selling the first but second is the only one which works.

For a manager, the spec exists in order to create a delgation ticket, something you assign to someone and done. But for a builder, it exists as a thinking tool that evolves with the code to sharpen the understanding/thinking.

I also think, that some builders are being fooled into thinking like managers because ease, but they figure it out pretty quickly.

by causalityltd

3/19/2026 at 2:09:27 PM

Also, however much you manage the project, eventually you do need to actually "build". You can't deliver on hype alone. Or maybe you can, but only for some news cycles or VC meetings. The user will eventually need the promised product.

by causalityltd

3/19/2026 at 6:02:40 AM

Natural language is fluid and ambiguous while code is rigid and deterministic. Spec-driven development appears to be the best of both worlds. But really, it is the worst of both. LLMs are language models - their breakthrough capability is handling natural language. Code is meant to be unambiguous and deterministic. A spec is neither fluid nor deterministic.

by kikkupico

3/19/2026 at 6:41:44 PM

> code is rigid and deterministic

I really wish people would start defining what they mean when they say code is deterministic. For instance, code is not deterministic in the sense that it admits a single interpretation when compiled. C code admits many assembly interpretations and the assembly you get will vary depending on your compiler and parameters.

When people say code is deterministic I think they mean it has a formal semantics. But these formal semantics may admit any one of many interpretations.

by derrak

3/19/2026 at 8:40:57 PM

> C code admits many assembly interpretations and the assembly you get will vary depending on your compiler and parameters.

There's your own example defining how code is deterministic. Despite being compiled to different assembly interpretations, the code does the same thing.

by kikkupico

3/20/2026 at 5:32:34 PM

All compilations do what I intended to do when I wrote the C code. But each compilation does the intended thing in a different way.

The point I’m making is that determinism isn’t the key difference between compilers and LLMs. The key difference is that LLMs do not give semantic guarantees. I don’t care if they give the same output on the same input, just as I don’t care very much which C compiler I use.

by derrak

3/19/2026 at 8:16:08 AM

I don't agree. The code is much more than the spec. In fact, the typical project code is 90% scaffolding and infrastructure code to put together and in fact contains implementation details specific to the framework you use. And only 10% or less is actual "business logic". The spec doesn't have to deal with language, framework details, so by definition spec is the minimum amount of text necessary to express the business logic and behaviour of the system.

by ACV001

3/19/2026 at 9:16:13 AM

With ALL due respect (seriously), this is just a misconception of yours.

When you write software to solve a problem you start with as little as possible details such that when you read it, it would only talk about the business logic. What I mean by that? That you abstract away any other stuff that does not concern the domain you are in, for example your first piece of code should never mention any kind of database technology, it should not contain any reference to a specific communication layer (HTTP for example), and so on.

When you get to this, you have summarized the "spec", and usually it can be read very easily by a non-techincal person (but obviously is a "domain expert") and can also be very testable.

I hope this helps on why the author is right 100%

by duesabati

3/19/2026 at 12:42:19 PM

Aka essential and accidental complexity. Most good code will abstract the latter, even if it’s only a shim.

by skydhash

3/19/2026 at 12:00:49 PM

I agree. Simple example: if you have a program written with MySQL as the backend, in most cases you could switch to e.g. Postgres and still meet the spec in every meaningful sense of spec. Will the code change? More than just a find/replace of "MySQL" to "Postgres"? Of course.

by quietbritishjim

3/19/2026 at 8:11:01 AM

> On two occasions I have been asked [by members of Parliament], 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.

I guess many of us quality for british parliament.

by amtamt

3/19/2026 at 9:33:54 AM

I cannot shake the feeling that with that quote the people were trying to point out a problem with something he’d said.

“With this machine, all the results will be correct, no more errors in log tables”

“And what if people put the wrong figures in?” (Hint - we’d still have the wrong results)

Babbage walks away thinking them an idiot, they walk away thinking Babbage hasn’t considered anything outside of the machine itself.

by IanCal

3/19/2026 at 5:08:42 PM

Even two wrong, but countering, inputs can sometimes provide right answer, so the rational above has some merit.

But if I interpret the question with line of thinking "should I anticipate right/ full answers despite incorrect/ incomplete inputs?" I think Baggage was pointing out the problem in the logic why such questions should arise.

I would expect the question to be phrased "under what circumstances the machine with provide wrong outputs?", and would have hoped for Babbage (or may be anyone) explaining many ways how things could go wrong.

by amtamt

3/19/2026 at 9:39:49 AM

Maybe the joke was on Babbage. Possible that they were asking about trust and over-reliance, which turned out to be the real problem.

by billynomates

3/19/2026 at 8:45:07 AM

Funnily enough, with the most recent models (having reduced sycophancy), putting in the wrong assumptions often still leads to the right output.

by jumploops

3/19/2026 at 1:09:14 PM

Safety-critical perspective:

Specification means requirements. I like EARS [0] syntax for requirements.

e.g. "while an error is present, the software shall ignore keypresses"

Requirements are not code at all, they are expectations about what the code does; it is the contract about what developers will be held accountable to. Putting implementation details in requirements is a rookie mistake because it takes agency away from the engineers in finding the best solution.

The spec discussed in this article is more akin to the level of detail appropriate in an interface control document (ICD). Very common for a requirement to declare the software shall be compliant to a revision of an ICD.

My own thoughts are: like a good systems engineer that recognizes the software engineers know their domain better than they, we should write specification for AI that leaves room for it to be more clever than we ourselves are. What's the point of exhaustive pseudocoding, it's worse coding. Align on general project preferences, set expectations, and concentrate effort on verifying.

[0]: https://alistairmavin.com/ears/

by randusername

3/19/2026 at 1:15:12 PM

> for AI that leaves room for it to be more clever than we ourselves are.

How probable is that given that LLMs output the average of their training data? Your own cleverness would have to be below average for this to hold.

by zombot

3/19/2026 at 5:54:02 PM

> LLMs output the average of their training data

I think it follows that if you ask it to implement something, it will implement the average thing.

But then it also follows that if you give it the average thing and ask it to improve it by doing something clever, it may identify a clever improvement if such an improvement has ever been done in an analogous setting that exists in its training set.

> Your own cleverness would have to be below average

In this case the difference between myself and the LLM is that I’ve never seen the clever thing before (or I cannot recall it as effectively), but the LLM has and the LLM is specialized to recall it

by derrak

3/19/2026 at 6:58:59 AM

I've been trying codex and claude code for the past month or so. Here's the workflow that I've ended up with for making significant changes.

- Define the data structures in the code yourself. Add comments on what each struct/enum/field does.

- Write the definitions of any classes/traits/functions/interfaces that you will add or change. Either leave the implementations empty or write them yourself if they end up being small or important enough to write by hand (or with AI/IDE autocompletion).

- Write the signatures of the tests with a comment on what it's verifying. Ideally you would write the tests yourself, specially if they are short, but you can leave them empty.

- Then at this point you involve the agent and tell it to plan how to complete the changes without barely having to specify anything in the prompt. Then execute the plan and ask the agent to iterate until all tests and lints are green.

- Go through the agent's changes and perform clean up. Usually it's just nitpicks and changes to conform to my specific style.

If the change is small enough, I find that I can complete this with just copilot in about the same amount of time it would take to write an ambiguous prompt. If the change is bigger, I can either have the agent do it all or do the fun stuff myself and task the agent with finishing the boring stuff.

So I would agree with the title and the gist of the post but for different reasons.

Example of a large change using that strategy: https://github.com/trane-project/trane/commit/d5d95cfd331c30...

by trane_project

3/19/2026 at 7:11:55 AM

Don't you also need to specify the error-cases at each stage and at what level of the system you would like to handle them (Log away, throw ever more up, Inform others, create Tasks, etc.)?

I found that to be really vital for good code. https://fsharpforfunandprofit.com/rop/

by 21asdffdsa12

3/19/2026 at 7:16:31 AM

It's mostly rust projects so error handling is writing `?` and defining the signatures as either Option or Result for the most part.

by trane_project

3/19/2026 at 7:05:53 AM

My twist on this is to first vibe code the solution with the aim of immediately replacing it.

I’ve found that two to three iterations with various prompts or different models will often yield a surprising solution or some aspect I hadn’t thought of or didn’t know about.

Then I throw away most or all of the code and follow your process, but with care to keep the good ideas from the LLMs, if any.

by jiggawatts

3/19/2026 at 7:25:37 AM

I mostly work with existing codebases so I didn't really want to vibecode for real.

The only vibecoded thing was an iOS app and I didn't follow this process because I don't know iOS programming nor do I want to learn it. This only works if you know at least how to define functions and data structures in the language, but I think most PMs could learn that if they set their minds to it.

by trane_project

3/19/2026 at 7:12:51 AM

The hovering selector, throneing over busy agents, picking the chosen parts condemning the rest..

by 21asdffdsa12

3/19/2026 at 7:53:57 AM

I've been working with coding LLMs for almost a year. Here's what I've found works best:

- Do a brainstorming session with the LLM about your idea. Flesh out the major points of the product, who the stakeholders are, what their motivations and goals are, what their pain points are. Research potential competitors. Find out what people are saying about them, especially the complaints.

- Build a high level design document with the LLM. Go through user workflows and scenarios to discern what kinds of data are needed, and at what scale.

- Do more market research to see how novel this approach is. Figure out what other approaches are used, and how successful they are. Get user pain points with each approach if you can. Then revisit your high level design.

- Start a technical design document with the LLM. Figure out who the actors of the system are, what their roles are in the system, and what kinds of data they'll need in order to do their job.

- Research the technologies that could help you build the system. Find out how popular they are, how reliable they are, how friendly they are (documentation, error messaging, support, etc), their long-term track record, etc. These go into a research document.

- Decide based on the research which technologies match your system best. Start a technical document with the LLM. Go through the user scenarios and see how the technologies fit.

- Decide on the data structures and flows through the system. Caching, load balancing, reliability, throughput requirements at the scale you plan to reach for your MVP and slightly beyond. Some UX requirements at this point are good as well.

- Start to flesh out your interfaces, both user and machine. Prototype some ideas and see how well they work.

- Circle back to research and design based on your findings. Iterate a few times and update the documents as you go using your LLM. Try to find ways to break it.

- Once you're happy with your design, build an architecture document that shows how the whole system will concretely fit together.

- Build an implementation plan. Run it through multiple critique rounds. Try to find ways to break it.

- Now you're at the threshold where changes get harder. Build the implementation piece by piece, checking to make sure they work as expected. This can be done quickly with multiple LLMs in parallel. Expect that the pieces won't fit and you'll need to rethink a lot of your assumptions. Code will change a LOT, so don't waste much time making it nice. You should have unit and integration tests and possibly e2e tests, which are cheap for the LLM to maintain, even if a lot of them suck.

- Depending on how the initial implementation went, decide whether to keep the codebase and refine it, or start the implementation over using the old codebase for lessons learned.

Basically, more of the same of what we've been doing for decades, just with faster tools.

by kstenerud

3/19/2026 at 7:58:18 AM

You are basically discovered working in a team. Even that it is an inferior version of that.

I have always done that steps with my team and the results are great.

If you are a solo developer I understand that the LLM can help somewhat but not replace a real team of developers.

by Frieren

3/19/2026 at 8:23:11 PM

Like I said: more of the same of what we've been doing for decades, just with faster tools (LLMs).

by kstenerud

3/19/2026 at 3:38:17 PM

[dead]

by TheJord

3/19/2026 at 6:56:57 AM

It helps to decouple the business requirements from the technical ones. It's often not possible to completely separate these areas, but I've been on countless calls where the extra technical detail completely drowns out the central value proposition or customer concern. The specification should say who, what, where, when, why. The code should say how.

The code will always be an imperfect projection of the specification, and that is a feature. It must be decoupled to some extent or everything would become incredibly brittle. You do not need your business analysts worrying about which SQLite provider is to be used in the final shipped product. Forcing code to be isomorphic with spec means everyone needs to know everything all the time. It can work in small tech startups, but it doesn't work anywhere else.

by bob1029

3/19/2026 at 7:05:54 AM

A regular person says "I want a house and it must have a toilet"

Most people don't specify or know that they want a U bend in their pipes or what kind or joints should be used for their pipes.

The absence of U bends or use or poor joints will be felt immediately.

Thankfully home building is a relatively solved problem whereas software is everything bespoke and every problem slightly different... Not to mention forever changing

by djtango

3/19/2026 at 7:51:20 AM

I agree with this so much. And on top of this, I have the strong feeling that LLMs are BETTER at code than they are at english, so not only are you going from a lossy formate to a less-leossy format, you are specifying in a lossy, unskilled format.

by motoxpro

3/19/2026 at 6:28:26 AM

For this to be true, we should be able to

- Delete code and start all over with the spec. I don't think anyone's ready to do that.

- Buy a software product / business and be content with just getting markdown files in a folder.

by rahulj51

3/19/2026 at 6:35:23 AM

I've heard of people _experimenting_ with deleting their code every day.

I haven't heard of being content paying for a product consisting of markdown files. Though I could imagine people paying for agent skill files. But yet, the skills are not the same product as say, linear.

by zevisert

3/19/2026 at 8:00:29 AM

This and the idea of creating some sort of english-adjacent programming language for LLM's is fun, I guess, but what problem is it solving. Do we really need to go an abstraction higher?

by midasz

3/19/2026 at 8:16:18 AM

>> Misconception 1: specification documents are simpler than the corresponding code

I used to be on that side of the argument - clearly code is more precise so it MUST be simpler than wrangling with the uncertainty of prose. But precision isn't the only factor in play.

The argument here is that essential complexity lives on and you can only convert between expressions of it - that is certainly true but it's is overlooking both accidental complexity and germane complexity.

Specs in prose give you an opportunity to simplify by right-sizing germane complexity in a way that code can't.

You might say "well i could create a library or a framework and teach everyone how to use it" and so when we're implementing the code to address the essential complexity, we benefit from the germane complexity of the library. True, but now consider the infinite abstraction possible in prose. Which has more power to simplify by replacing essential complexity with germane complexity?

Build me a minecraft clone - there's almost zero precision here, if it weren't for the fact that word minecraft is incredibly load bearing in this sentence, then you'd have no chance of building the right thing. One sentence. Contrast with the code you'd have to write and read to express the same.

by CraigJPerry

3/19/2026 at 11:14:36 PM

It seems like everyone has a very different idea of what spec means in agentic coding.

To me, spec answers the what, the plan answers the how, and in what order, build packets answer the how but with more granularity.

In most cases, you should only care about the what. How it gets done (plan) is simply an implementation detail that you should not care about the same way automated tests should not care about them.

What you prescribe in spec is that data must pass from A to B through C, preserved in D and presented in E in shape of F. It's much easier to write (and change) this in spec than in say, Rust.

by gck1

3/19/2026 at 7:45:05 AM

Why is everyone still talking about markdown files as the only form of spec? The argument is true for text-based specs, but that's not the only option. Stop being so text-file-brained?

This article is really attacking vague prose that pushes ambiguity onto the agent - okay, fair enough. But that's a tooling problem. What if you could express structure and relationships at a higher level than text, or map domain concepts directly to library components? People are already working on new workflows and tools to do just that!

Also, dismissing the idea that "some day we'll be able to just write the specs and the program will write itself" is especially perplexing. We're already doing it, aren't we? Yes, it has major issues but you can't deny that AI agents are enabling literally that. Those issues will get fixed.

The historical parallel matters here as well. Grady Booch (co-creator of UML) argues we're in the third golden age of software engineering:

- 1940s: abstracted away the machine -> structured programming

- 1970s: abstracted away the algorithm -> OOP, standard libraries, UML

- Now: abstracting away the code itself

Recent interview here: https://www.youtube.com/watch?v=OfMAtaocvJw

Each previous transition had engineers raising the same objections: "this isn't safe", "you're abstracting away my craft". They were right that something was lost, but wrong that it was fatal. Eventually the new tools worked well enough to be used in production.

by prohobo

3/19/2026 at 8:25:05 AM

> - 1970s: abstracted away the algorithm -> OOP, standard libraries, UML

Which was mostly a failure, to the point that there is a major movement towards languages that "abstract away" (in this sense) less, e.g. Rust.

Certainly if the creators of UML are saying that AI is great, that gives me more confidence than ever that it's bunk.

by lmm

3/19/2026 at 8:47:10 AM

UML was for various reasons, but libraries? When's the last time you wrote a sorting algorithm? The entire software ecosystem runs on dependencies. That failed?

Rust uses crates to import those dependencies, which was one of its biggest innovations.

by prohobo

3/19/2026 at 7:18:42 AM

There's essential complexity and accidental complexity.

A sufficiently detailed spec need only concern itself with essential complexity.

Applications are chock-full of accidental complexity.

by barrkel

3/19/2026 at 9:00:26 AM

There are two kid of specs, formal spec, and "Product requirements / technical designs"

Technical design docs are higher level than code, they are impricise but highlight an architectural direction. Blanks need to be filled in. AI Shines here.

Formal specs == code Some language shine in being very close to a formal spec. Yes functional languages.

But lets first discuss which kind of spec we talk about.

by TeeWEE

3/19/2026 at 10:10:59 AM

I've heard various suggestions of only committing spec.md or change requests in the git repo and using that as source of truth.

We have spent decades working on reproducible builds or deterministic compilation. To achieve this, all steps must be deterministic. LLMs are not deterministic. You need to commit source code.

by ptman

3/19/2026 at 5:15:38 AM

This won't age well, or my comment won't age well. We'll see!

by adi_kurian

3/19/2026 at 5:24:10 AM

It will either be true or not be true

That is the great insight I can offer

by ex-aws-dude

3/19/2026 at 3:42:10 PM

Mostly, I wrote "this won't age well", then appended the circular comment as I didn't want to sound like an arsehole.

Looks like I would have been better off!

by adi_kurian

3/19/2026 at 7:25:36 AM

Wait, people still believe in the law of excluded middle?

by layer8

3/19/2026 at 6:13:20 AM

It is indeed of type boolean.

by ozozozd

3/19/2026 at 10:57:17 AM

I agree with most of what the author is saying, but the slogan that "a sufficiently detailed spec is code" can be misunderstood as "a sufficiently detailed spec is a program". The statement is only true if you read "code" as "statement in a formal language". Here's a (sketch of a) specification for a compiler:

> For every specification satisfied by the input program, the output program satisfies the same specification.

This is not a program and it does not become a program once you fill in the holes. Making the statement precise clearly requires a formal language, but that language can work at a higher level of abstraction than a programming language. So yes, a specification can absolutely be simpler than a program that implements it.

by fmap

3/19/2026 at 12:54:22 PM

But that is not a specification for a compiler. That is just one of the requirement. It’s still missing all the error cases that can happen. Granted, most of those decisions will be taken at implementation time or during design discussions, but I believe if we noted them down, it would have been longer than the code. It’s just that people don’t bother and refer to the code for these kind of information.

by skydhash

3/19/2026 at 6:54:35 PM

> if we noted them down, it would have been longer than the code

If you are only noting down requirements of what the program is doing rather than how it should do that thing, I would expect that writing the requirements would necessarily be more succinct than writing the program.

It terrifies me every time I realize people are just jumping into writing code without specifying what that code is required to do. A day of hacking saves an hour of critical thought and planning.

by derrak

3/19/2026 at 6:31:19 AM

This is laid out in “the code is the design” - https://www.developerdotstar.com/mag/articles/reeves_design_... by jack reeves.

Like they say “everything comes round again”

by lifeisstillgood

3/19/2026 at 7:31:35 AM

A sufficiently detailed spec was actually a small step in the path to functional code.

Then came all sorts of shenanigans, from memory management to syntax hell, which took forever to learn effectively.

This stage was a major barrier to entry, and it's now gone — so yeah, things have indeed changed completely.

by brunorsini

3/19/2026 at 9:01:05 PM

A sufficiently detailed spec is not code. It's documentation containing a wealth of information that the code cannot. Code describes how a product works, not what it is supposed to do. That is the job of the specification [1] [2]. Notably, the specification omits implementation details. That is the job of the code.

Confusing the *how* and the *what* is very common when discussing specifications, in my experience. Programmers gravitate toward pseudocode when they have trouble articulating a functional requirement.

> Specifications were never meant to be time-saving devices.

Correct. Anyone selling specifications as a way to save time does not understand the purpose of a specification. Unfortunately, neither does the article's author. The article is based on a false premise.

LLMs experience the same problems as humans when provided with underspecified requirements. That's a specification problem.

[1]: https://en.wikipedia.org/wiki/Software_requirements_specific...

[2]: https://en.wikipedia.org/wiki/Formal_specification

by jason_oster

3/19/2026 at 3:57:50 AM

I tried myself to make a language over an agent's prompt. This programing language is interpreted in real time, and parts of it are deterministic and parts are processed by an LLM. It's possible, but I think that it's hard to code anything in such a language. This is because when we think of code we make associations that the LLM doesn't make and we handle data that the LLM might ignore entirely. Worse, the LLM understands certain words differently than us and the LLM has limited expressions because of it's limits in true reasoning (LLMs can only express a limited number of ideas, thus a limited number of correct outputs).

by ranyume

3/19/2026 at 5:40:13 AM

A corollary of this statement is that code without a spec is not code. No /s, I think that is true - code without a spec certainly does something, but it is, by the absence of a detailed spec, undefined behavior.

by ulrikrasmussen

3/19/2026 at 5:59:57 AM

On the contrary, code is a spec. In a decent language it should look like one.

by lmm

3/19/2026 at 7:11:53 AM

Code is a specific implementation of a spec. You can even use it as a spec if you're happy to accept exactly what the code does. But the code doesn't tell you what was supposed to be built so the code is not a spec.

Simple thought experiment: Imagine you have a spec and some code that implements it. You check it, and find that a requirement in the spec was missed. Obviously that code is not the spec; the spec is the spec. But also you couldn't even use the code as a spec because it was wrong. Now remove the spec.

Is the code a spec for what was supposed to be built? No. A requirement was missed. Can you tell from just the code? Also no. You need a two separate sources that tell you what was meant to be written in case the either of them is wrong. That is usually a spec and the code.

They could both be wrong, and often are, but that's a people problem.

by onion2k

3/19/2026 at 8:14:37 AM

> Obviously that code is not the spec; the spec is the spec. But also you couldn't even use the code as a spec because it was wrong.

Or the spec was wrong and the code was right; that happens more often than you might think. If we view it through the lens of "you have two specs, one of which is executable" then obviously you could have made an error in either spec, and obviously writing two specs and checking for mismatches between them is one possible way to increase the fidelity of what you wrote, but it's nothing more than that.

by lmm

3/19/2026 at 12:43:05 PM

That assumes that the two have equal 'authority' though. Really one of the two should be the source of truth, and that's usually the spec. That's definitely the one that the should win if there's a difference in my experience.

by onion2k

3/19/2026 at 7:54:40 AM

No way! Code is whatever defines the behavior of the program unambiguously, or as close to unambiguously as makes little difference. A sufficiently detailed spec does that, and hence it is effectively higher level code. Code (together with the language spec and/or compiler) always does that, regardless of how haphazardly written it is.

by retsibsi

3/19/2026 at 6:59:31 AM

Maybe an argument can be made that this definitely holds for some areas of the feature one is building. But in ever task there might be areas where the spec, even less descriptive than code, is enough, because many solutions are just „good enough“? One example for me are integration tests in our production application. I can spec them with single lines, way less dense than code, and the llms code is good enough. It may decide to assert one way or another, but I do not care as long as the essence is there.

Could be that the truth is somewhere in between?

by wazHFsRy

3/19/2026 at 5:27:04 AM

I am developing my own programming language, but I have no specification written for it. When people tell me that I need a specification, I reply that I already have one - the source code of the language compiler.

by Panzerschrek

3/19/2026 at 6:12:03 AM

You are not wrong. But, they are not wrong either.

I feel like if you’re designing a language, the activity of producing the spec, which involves the grammar etc., would allow you to design unencumbered by whether your design is easy to implement. Or whether it’s a good fit for the language you are implementing the compiler with.

The OP also correctly identifies that thoughtful design takes a back seat in favor of action when we start writing the code.

by ozozozd

3/19/2026 at 6:28:49 AM

The source code does what it does, including bugs.

So unless you want bugs to be your specification, you actually need to specify what you want.

by naruhodo

3/19/2026 at 6:48:54 AM

A corollary to the linked article is that a specification can also have bugs. Having a specification means that you can (in theory) be sure you have removed all inconsistencies between that specification and the source code, but it does not mean you can know you have removed all bugs, since both the spec and the source code could have the same bug.

by lunar_mycroft

3/19/2026 at 7:36:29 AM

A bug is a difference between specification and its implementation. In no specification exists, there is no bug (strictly speaking). In a more wide sense a bug is a difference between some implementation and user's expectations, but such expectations may be considered to be some sort of non-formal specification.

by Panzerschrek

3/19/2026 at 8:15:56 AM

A programming language is not the compiler. A programming language is, in fact, not software.

by Antibabelic

3/19/2026 at 7:01:53 PM

Trying to define what a programming language is, is a lost cause anyway :)

by qexat

3/19/2026 at 6:11:05 AM

Maybe this is your point, but the source code of any non-toy compiler is not a usable specification for the language it compiles.

If you want a specification from source code, you need to reverse engineer it. Although that’s a bit easier now, with LLMs.

by antonvs

3/19/2026 at 7:40:06 AM

What do you mean "usable specification"? Usable to produce another compiler implementation for the same language? This IS possible (like clang was designed to match GCC behavior).

Pure specification itself is useless without actual implementation (which does the job), so, trying to write such specification (in a natural language) has no practical sense.

by Panzerschrek

3/19/2026 at 3:43:32 PM

You partly answered your own question by mentioning "implementation". Specification and implementation are two different things, and both are important.

A specification describes the externally observable behavior, constraints, and properties of a system. That's independent of how those properties are implemented.

An implementation is a concrete realization of a specification, that achieves the specified behavior using particular algorithms, data structures, and operational mechanisms.

> Pure specification itself is useless without actual implementation (which does the job), so, trying to write such specification (in a natural language) has no practical sense.

That's a non sequitur, i.e. the part of your sentence after the commas doesn't follow from the first part.

"Pure specification itself" has a purpose, which is to guide implementations. That's the "practical sense".

by antonvs

3/19/2026 at 3:57:39 AM

I agree to this, with the caveat that a standard is not a spec. E.g.: The C or C++ standards, they're somewhat detailed, but even if they were to be a lot more detailed, becoming 'code' would defeat the purpose (if 'code' means a deterministic turing machine?), because it won't allow for logic that is dependent on the implementer ("implementation defined behavior" and "undefined behavior" in C parlance). whereas a specification's whole point is to enforce conformance of implementations to specific parameters.

by notepad0x90

3/19/2026 at 4:17:28 AM

Even programs are just specifications by that standard. When you write a program in C, you are describing what an abstract C machine can do. When the C compiler turns that into a program it is free to do so in any way that is consistent with the abstract C machine.

If you look at what implementions modern compilers actually come up with, they often look quite different from what you would expect from the source code

by gizmo686

3/19/2026 at 4:23:29 AM

I don't disagree, so in a way, compilers are the specification that implement the standard? That doesn't feel right though.

by notepad0x90

3/19/2026 at 4:31:56 AM

Compilers are converters. There’s the abstract machine specified by the standard and there’s the real machine where the program will run (and there can be some layer in between). So compilers takes your program (which assumes the abstract machine) and builds the link between the abstract and the real.

If your program was a DSL for steering, the abstract machine will be the idea of steering wheel, while the machine could be a car without one. So a compiler would build the steering wheel, optionally adding power steering (optimization), and then tack the apparatus to steer for the given route.

by skydhash

3/19/2026 at 1:29:56 PM

I'm using obra superpowers plugin in CC and the plans it produces are very code-heavy, which is great for reviewing - I catch issues early. It's quite verbose however and if your codebase moves quickly, the code might have bad/stale examples.

Those very detailed specs then let agents run for a long time without supervision so nice for multi tasking :)

by tomasz-tomczyk

3/19/2026 at 6:06:11 AM

Such amazing writing. And clear articulation of what I’ve been struggling to put into words - almost having to endure a mental mute state. I keep thinking it’s obvious, but it’s not, and this article explains it very elegantly.

I also enjoyed the writing style so much that I felt bad for myself for not getting to read this kind of writing enough. We are drowning in slop. We all deserve better!

by ozozozd

3/19/2026 at 5:26:33 PM

This holds in the opposite direction. Why should an LLM rtfm when it could just as well read the implementation? Of course, you might need a copy of the code in your workspace that it can index, and that usually isn't a problem

by mohamedkoubaa

3/19/2026 at 5:19:47 PM

Too much of code is data transformation. input -> sanitation -> db -> consumer -> api -> client. Business logic defines the shape of that data and some service-level rules but the majority is just shoveling data.

by ppeetteerr

3/19/2026 at 12:08:18 PM

I’ve noticed that the so-called industry is always fixated on the most crappy language. First it was JavaScript, then Python, now English. They have the common problem of being too flexible and ambiguous, causing bugs that are incredibly hard to find. Yet somehow they always become the most popular, and there’s always a cult about it, mostly consist of people who aren’t real computer scientists.

“This time is different” are the famous last words.

by xinan

3/19/2026 at 12:38:39 PM

No. Code will contain bugs, won't be self-contained (will depend on 3rd party libraries and frameworks), and often will not be complete as TODO features can't possibly be a spec

by zteppenwolf

3/19/2026 at 12:54:20 PM

So you think CodeSpeak by the creator of Kotlin will fail, pure and simple? https://codespeak.dev

by lo_fye

3/19/2026 at 2:52:30 PM

Yes.

by j_w

3/19/2026 at 7:37:43 PM

Motivated to create account by Dyjkstra and Borges quotes. Partially read. Hoping for Chuang Tzu.

by brucer42

3/19/2026 at 2:31:37 PM

I think in an ideal world, programming languages[1] would allow you to express key behaviors, interfaces, states, etc very precisely and sharpen your own understanding while writing, but importantly, also leave unimportant behaviors, structures, etc unspecified, using reasonable defaults and similar stuff.

I sort of see the success of coding agents as not just a sign that people get tricked by slop, but mainly a sign that today's languages and frameworks just aren't that good. And now it's easier to say to the LLM, "I literally don't care how the UI is organized as long as everything is visible" than to try to express that in code.

[1]: I mean this in the vaguest sense, so including DSLs/frameworks/interfaces offered by libraries too

by ngriffiths

3/19/2026 at 5:13:45 AM

Just waterfall harder

by adampunk

3/19/2026 at 11:09:00 AM

Good engineering has one loop. Hand coding and codegen are just different workers inside that loop.

by itmitica

3/19/2026 at 3:43:17 AM

I agree with the overall structure of the argument but I like to think of specifications like polynomial equations defining some set of zeroes. Specifications are not really code but a good specification will cut out a definable subset of expected behaviors that can then be further refined with an executable implementation. For example, if a specification calls for a lock-free queue then there are any number of potential implementations w/ different trade-offs that I would not expect to be in the specification.

by measurablefunc

3/19/2026 at 5:31:22 AM

I kind of feel like the specification would call for an idealized lock free queue. Whereas the code would generate a good enough approximation of one that can be run on real hardware.

To invert your polynomial analogy, the specification might call for a sine wave, your code will generate a Taylor series approximation that is computable.

by catlifeonmars

3/19/2026 at 6:57:10 AM

A thorough specification might even include the acceptable precision on the sine wave; a thorough engineer might ask the author what the acceptable precision is if it's omitted.

by codebje

3/19/2026 at 5:23:37 AM

Meh, it's the age old distinction between Formal vs Informal language.

Simply put: Formal language = No ambiguities.

Once you remove all ambiguous information from an informal spec, that, whatever remains, automatically becomes a formal description.

by pkoird

3/19/2026 at 5:26:35 AM

Is that true though? If I define a category or range in formal language, I’m still ambiguous on the exact value. Dealing with randomness is even worse (eg input in random order), and can’t be prevented in real world programs.

by manmal

3/19/2026 at 7:09:03 AM

Kind of, that is why non-functional requirements exist, because not everything is code.

by pjmlp

3/19/2026 at 10:47:55 AM

Yes and LLMs can iterate and help you refine that spec more quickly.

by __alexs

3/19/2026 at 6:19:52 AM

No, a spec is not code. It's possible to describe simple behavior that's nevertheless difficult to implement. Consider, say,

  fn sin(x: f16) -> f16

There are only 64k different f16s. Easy enough to test them all. A given sin() is either correct or it's not.

Yet sin() here can have a large number of different implementations. The spec alone under-determines the actual code.

by quotemstr

3/19/2026 at 6:36:00 AM

Well, the spec can of course define constraints of how the function is implemented.

by mungoman2

3/19/2026 at 6:36:54 AM

It says a sufficiently detailed spec is code. Your spec lacks details that could be added.

by 9rx

3/19/2026 at 6:56:50 AM

If you're defining "spec" that way, the word is meaningless. The point of a spec is what it doesn't say.

by quotemstr

3/19/2026 at 1:20:23 PM

It's turtles all the way down. If you write your spec in C you can get pretty detailed into how sin is implemented, but not fully. The compiler is going to take your spec and still do things with it that you didn't say, like optimize it in ways you never imagined possible.

If "spec" doesn't imply that, what does it mean to you? Or maybe you are suggesting that C "code" isn't code[1] either?

[1] By the original definition of code that is actually quite true, but I think we can agree the term, as normally used, has evolved over the years?

by 9rx

3/19/2026 at 5:23:58 PM

Sufficiently detailed code is the spec.

https://xkcd.com/1172/

by bombcar

3/19/2026 at 7:42:55 AM

I don’t really find “can the model produce good code?” that interesting anymore. In the right workflow, it plainly can. I’ve gotten code out of LLMs that is not just faster than I’d write by hand, but often better in the ways that matter: tests actually get written, invariants get named, idempotency is considered, error conditions don’t get silently handwaved away because I’m tired or trying to get somewhere quickly.

When I code by hand under time pressure, I’m actually more likely to dig a hole. Not because I can’t write code, but because humans get impatient, bored, optimistic and sloppy in predictable ways. The machine doesn’t mind doing the boring glue work properly.

But that is not the real problem.

The real problem is what happens when an organisation starts shipping code it does not understand. That problem predates LLMs and it will outlive them. We already live in a world full of organisations that ship bad systems nobody fully understands, and the result is the usual quagmire: haunted codebases, slow change, fear-driven development, accidental complexity, and no one knowing where the actual load-bearing assumptions are.

LLMs can absolutely make that worse, because they increase the throughput of plausible code. If your bottleneck used to be code production, and now it’s understanding, then an organisation that fails to adapt will just arrive at the same swamp faster.

So to me the important distinction is not “spec vs code”. It’s more like:

• good local code is not the same thing as system understanding

• passing tests are not the same thing as meaningful verification

• shipping faster is not the same thing as preserving legibility

The actual job of a programmer was never just to turn intent into syntax anyway. Every few decades the field reinvents some story about how we no longer need programmers now: Flow-Matic, CASE tools, OO, RUP, low-code, whatever. It’s always the same category error. The hard part moves up a layer and people briefly mistake that for disappearance.

In practice, the job is much closer to iteratively solving a problem that is hard to articulate. You build something, reality pushes back, you discover the problem statement was incomplete, the constraints were wrong, the edge case was actually central, the abstraction leaks, the user meant something else, the environment has opinions, and now you are solving a different problem than the one you started with.

That is why I think the truly important question is not whether AI can write code.

It’s whether the organisation using it can preserve understanding while code generation stops being the bottleneck.

If not, you just get the same bad future as before, only faster, cleaner-looking, and with more false confidence.

by EastLondonCoder

3/19/2026 at 7:03:55 AM

It's a great argument against using software design tools (UML and other tools). The process of writing code is creating an executable specification. Creating a specification for your specification (and phrasing it as such) is a bit redundant.

The blue print analogy comes up frequently. IMHO this is unfortunate. Because a blueprint is an executable specification for building something (manually typically). It's code in other words. But for laborers, construction workers, engineers, etc. For software we give our executable specifications to an interpreter or compiler. The building process is fully automated.

The value of having specifications for your specifications is very limited in both worlds. A bridge architect might do some sketches, 3D visualizations, clay models, or whatever. And a software developer might doodle a bit on a whiteboard, sketch some stuff out on paper or create a "whooooo we have boxes and arrows" type stuff in a power point deck or whatever. If it fits on a slide, it has no meaningful level of detail.

As for AI. I don't tend to specify a lot when I'm using AI for coding. A lot of specification is implicit with agentic coding. It comes from guard rails, implicit general knowledge that models are trained one, vague references like "I want red/green TDD", etc. You can drag in a lot of this implicit stuff with some very rudimentary prompting. It doesn't need to be spelled out.

I put an analytics server live a few days ago. I specified I wanted one. And how I wanted it to work. I suggested Go might be a nice language to build it in (I'm not a Go programmer). And I went in to some level of detail on where and how/where I wanted the events to be stored. And I wanted a light js library "just like google analytics" to go with it. My prompt wasn't much larger than this paragraph. I got what I asked for and with some gentle nudging got it in a deployable state after a few iterations.

A few years ago you'd have been right to scald me for wasting time on this (use something off the shelf). But it took about 20 minutes to one shot this and another couple of hours to get it just right. It's running live now. As far as I can see with my few decades of experience, it's a pretty decent version of what I asked for. I did not audit the code. I did ask for it to be audited (big difference) and addressed some of the suggested fixes via more prompting ("sounds good, do it").

If you are wondering why, I'm planning to build a AI dashboard on top of this and I need the raw event store for that. The analytics server is just a dirt cheap means to an end to get the data where I need it. AI made the server and the client, embedded the client in my AI generated website that I deployed using AI. None of this involved a lot of coding or specifying. End to end, all of this work was completed in under a week. Most of the prompting work went into making the website really nice.

by jillesvangurp

3/19/2026 at 6:30:30 AM

I recently left this comment on another thread. At the time I was focused on planning mode, but it applies here.

Plan mode is a trap. It makes you feel like you're actually engineering a solution. Like you're making measured choices about implementation details. You're not, your just vibe coding with extra steps. I come from an electrical engineering background originally, and I've worked in aerospace most of my career. Most software devs don't know what planning is. The mechanical, electrical, and aerospace engineering teams plan for literal years. Countless reviews and re-reviews, trade studies, down selects, requirement derivations, MBSE diagrams, and God knows what else before anything that will end up in the final product is built. It's meticulous, detailed, time consuming work, and bloody expensive.

That's the world software engineering has been trying to leave behind for at least two decades, and now with LLMs people think they can move back to it with a weekend of "planning", answering a handful of questions, and a task list.

Even if LLMs could actually execute on a spec to the degree people claim (they can't), it would take as long to properly define as it would to just write it with AI assistance in the first place.

by scuff3d

3/19/2026 at 7:43:02 AM

I think of "plan" mode as a read-only mode where the LLM isn't chomping at the bit to start writing to files. Rather than being excitable and over-active, it is receptive and listening.

by Seattle3503

3/19/2026 at 6:25:59 PM

That's not really my point.

If you use it that way fine, but in general I'm talking about the idea that can you plan throughly enough in advance to get it to produce tens of thousands of lines of quality working code.

Like the author of the article points out, that takes so much effort by the time your done you may as well have just written the code.

by scuff3d

3/19/2026 at 4:50:38 AM

I have a lot of fun making requirements documents for Claude. I use an iterative process until Claude can not suggest any more improvements or clarifications.

"Is this it?" "NOPE"

https://www.youtube.com/watch?v=TYM4QKMg12o

by HoldOnAMinute

3/19/2026 at 7:57:39 AM

Called tests yeah

by qwertytyyuu

3/19/2026 at 3:29:00 PM

> Misconception 1: specification documents are simpler than the corresponding code

Anyone who studied software engineering, should know that specification doesn’t bother with implementation details of the underlying technology.

Things such as quite specific engine are used, are the contents of an encapsulated subsystem.

Proper software engineering specification is incompatible with a hacker culture and picking technology beforehand is a bad practice. It’s much closer to waterfall than to C4.

However, the last 20 years we got software building blocks which impose system architectural restrictions: frameworks. And also pieces of software which are half cooked systems.

Far are the days of requirements, preconditions, postconditions and invariants, network diagrams and entity relationship models.

by nudpiedo

3/19/2026 at 11:48:38 AM

I’d say it differently - that without code the spec is insufficient. Maybe you don’t need a full program as the spec but without some code, you’re left trying to be precise in natural language and that’s not what they are good for.

by ModernMech

3/19/2026 at 10:10:19 AM

just like what djikstra said

by firemelt

3/19/2026 at 10:03:03 AM

Writing code doesn't reliably work... However, as we converge toward a more collaborative development environment, code is more important than ever. Testing, sales, etc. more important than ever. Security... more important them ever.

Or everything will converge toward a rust like inference optimized gibberish...

by spacecadet

3/19/2026 at 8:42:46 AM

A piece of code is only one of many different implementations of a spec. And a good spec is careful to state what sort of variation is allowed and when rigid adherence is required.

by jibal

3/19/2026 at 8:32:09 AM

> Misconception 1: specification documents are simpler than the corresponding code

That is simply not true. There is a ton of litterature around inherent vs accidental complexity, which in an ideal world should map directly to spec vs code. There are a lot of technicalities in writing code that a spec writer shouldn't know about.

Code has to deal with the fact that data is laid out a certain way in ram and on disk, and accessing it efficiently requires careful implementation.

Code has to deal with exceptions that arise when the messiness of the real world collides with the ideality of code.

It half surprises me that this article comes from a haskell developer. Haskell developers (and more generally people coming from maths) have this ideal view of code that you just need to describe relationships properly, and things will flow from there.

This works fine up to a certain scale, where efficiency becomes a problem.

And yes, it's highly probable that AI is going to be able to deal with all the accidental complexity. That's how I use it anyways.

by d--b

3/19/2026 at 11:13:45 AM

This is relatable.

I did a side project with a non-technical co-founder a year ago and every time he told me what he wanted, I made a list of like 9 or 10 logical contradictions in his requirements and I had to walk him through what he said with drawings of the UI so that he would understand. Some stuff he wanted me to do sounded good in his head but once you walk through the implementation details, the solution is extremely confusing for the user or it's downright physically impossible to do based on cost or computational resource constraints.

Sure, most people who launched a successful product basically stumbled onto the perfect idea by chance on the first attempt... But what about the 99% others who fell flat on their face! You are the 99% and so if you want to succeed by actual merit, instead of becoming a statistic, you have to think about all this stuff ahead of time. You have to simulate the product and business in detail in your mind and ask yourself honestly; is this realistic? Before you even draw your first wireframe. If you find anything wrong with it, anything wrong at all; it means the idea sucks.

It's like; this feature is too computationally and/or financially expensive to offer for free and not useful enough to warrant demanding payment from users... You shouldn't even waste your time with implementation; it's not going to work! The fundamental economics of the software which exists in your imagination aren't going to magically resolve themselves after implementing in reality.

Translating an idea to reality never resolves any known problems; it only adds more problems!

The fact is that most non-technical people only have a very vague idea of what they want. They operate in a kind of wishy washy, hand-wavy emotion-centric environment and they think they know what they're doing but they often don't.

by jongjong

3/19/2026 at 2:55:39 PM

He wanted seven perpendicular lines ?

by serallak

3/20/2026 at 7:56:29 AM

Haha. I was like "Glad you've got this 4D chess going on in your head but the user can only see in 3D, buddy."

by jongjong

3/19/2026 at 5:07:45 AM

IMHO, LLMs are better at Python and SQL than Haskell because Python and SQL syntax mirrors more aspects of human language. Whereas Haskell syntax reads more like a math equation. These are Large _Language_ Models so naturally intelligence learned from non-code sources transfers better to more human like programming languages. Math equations assume the reader has context not included in the written down part for what the symbols mean.

by macinjosh

3/19/2026 at 5:57:05 AM

They are heavily post-trained on code and math these days. I don‘t think we can infer that much about their behavior from just the pre-training dataset anymore

by WanderPanda

3/19/2026 at 11:36:21 PM

SQL - somewhat, python - no. LLMs that write code only work well with proper guardrails. Dynamic languages like Python lack essential guardrails.

Once the project can't fit in model's usable context window (~150k tokens even for 1M models), you need code fighting back and leaving breadcrumbs for model to follow.

by gck1

3/19/2026 at 6:19:19 AM

They are not called Context-Sensitive Large Language Models though.

LLMs are very good at bash, which I’d argue doesn’t read like language or math.

by ozozozd

3/19/2026 at 5:27:25 AM

I suspect your probably right, but just for completeness, one could also make the argument that LLMs are better at writing Haskell because they are overfit to natural language and Haskell would avoid a lot of the overfit spaces and thus would generalize better. In other words, less baggage.

by catlifeonmars

3/19/2026 at 6:03:02 AM

I would guess they’re better at python and SQL than Haskell because the available training data for python and SQL is orders of magnitude more than Haskell.

by travisgriggs

3/19/2026 at 12:21:48 PM

I have found recent models to be quite respectable at Haskell, given a couple of initial nudges on style - but that’s true of anything.

by dwb

3/19/2026 at 9:33:10 AM

Not really. Since a lot of code are tradeoffs and decisions made by programmers but the business spec is met either way.

by nsnzjznzbx

3/19/2026 at 8:22:38 PM

It's a bad argument because it misses the point that a spec isn't meant to be 100% detailed. A spec is only meant to contain just the amount of detail that matters to you, leaving the rest to the AI to decide when coding.

Of course I also use the AI to refine the spec, but again not to 100% detail, only to ensure I truly cover all that matters to me.

by OutOfHere

3/19/2026 at 12:47:15 PM

[dead]

by justboy1987

3/20/2026 at 9:38:53 AM

[dead]

by odziggy

3/19/2026 at 3:29:17 PM

[dead]

by qcautomation

3/19/2026 at 9:44:07 AM

[dead]

by openclaw01

3/19/2026 at 8:11:45 AM

[dead]

by seedpi

3/19/2026 at 3:27:41 PM

[dead]

by minnzen

3/19/2026 at 7:22:22 AM

[dead]

by ruhith

3/19/2026 at 4:06:05 AM

[dead]

by cheevly

3/19/2026 at 12:35:54 PM

[dead]

by MartinezOnCha24

3/19/2026 at 4:45:37 AM

[dead]

by X95_BNB35

3/19/2026 at 8:04:58 PM

[dead]

by AnalyticsData50

3/19/2026 at 6:43:00 AM

[dead]

by ossianericson

3/19/2026 at 6:08:21 AM

[dead]

by cranberryturkey

3/19/2026 at 3:02:28 PM

[dead]

by AliEveryHour16

3/19/2026 at 10:25:06 AM

[dead]

by speefers

3/19/2026 at 6:06:07 AM

[dead]

by Iamkkdasari74

3/19/2026 at 4:36:30 AM

[dead]

by tomlin

3/19/2026 at 9:14:59 AM

[dead]

by gethwhunter34

3/19/2026 at 9:18:28 AM

You mean as arguments that proves the author's pov? Yeah definitely

by duesabati

3/19/2026 at 5:18:57 AM

True. That's why I only write assembly. Imagine a piece of software deciding register spill for you! Unhinged!

by randyrand

3/19/2026 at 8:24:50 AM

> A sufficiently detailed spec is code

Posting something like this in 2026: they must not have heard of LLMs.

And also: this is such a typical thing a functional programmer would say: for them code is indeed a specification, with strictly no clue (or a vague high level idea at most) as to how the all effing machine it runs on will actually conduct the work.

This is not what code is for real folks with real problems to solve outside of academic circles: code is explicit instructions on how to perform a task, not a bunch of constraints thrown together and damned be how the system will sort it out.

And to this day, they still wonder why functional programming almost never gets picked up in real world application.

by ur-whale

3/19/2026 at 3:35:19 PM

Author here: not only have I heard of LLMs but I built a domain-specific programming language for prompt engineering: https://github.com/Gabriella439/grace

by Gabriel439

3/19/2026 at 4:08:00 AM

This articles ignores that AI agents have intelligence which means that they can figure out unspecified parts of the spec on their own. There is a lot of the design of software that I don't care about and I'm fine letting AI pick a reasonable approach.

by charcircuit

3/19/2026 at 4:36:39 AM

These algorithms don't have intelligence, they just regurgitate human intelligence that was in their training data. That also goes the other way - they can't produce intelligence that wasn't represented in their training input.

by abcde666777

3/19/2026 at 6:29:08 AM

How does post-training via reinforcement learning factor in? Does every evaluated judgement count as 'the training data' ?

by half-kh-hacker

3/19/2026 at 7:08:39 AM

I guess I'd place both within a broader umbrella: human generated input. So it still holds that they're regurgitating the decisions made by humans.

by abcde666777

3/19/2026 at 8:50:35 AM

yes

by internet_points

3/19/2026 at 8:01:40 AM

Firstly, it doesn't really matter if they can produce novel designs or not. 99% of what is being done is not novel. It is manipulating data in ways computers have been manipulating data for decades. The design of what is implemented is also going to be derivative of what already exists in the world too. Being too novel makes for a bad product since users will not easily understand how to use it and adapt their existing knowledge of how other things work.

Secondly, they are able to produce intelligence that wasn't represented in their training input. As a simple example take a look at chess AI. The top chess engines have more intelligence over the game of chess than the top humans. They have surpassed humans understanding of chess. Similar with LLMs. They train on synthetic data that other LLMs have made and are able to find ways to get better and better on their own. Humans learn off the knowledge of other humans and it compounds. The same thing applies to AI. It is able to generated information and try things and then later reference what it tried when doing something else.

by charcircuit

3/19/2026 at 9:32:47 AM

Chess AI isn't trained in the same way. Things like alpha zero partly worked by playing themselves recursively, meaning they actually did generate novel data in the process.

That was partly possible because chess is a constrained domain: rigid rules and board states.

But LLM land is not like that. LLM land was trained on pre-existing text written by humans. They do discover patterns within said data but the point stands that the data and patterns within are not actually novel.

by abcde666777

3/19/2026 at 9:41:17 AM

>LLM land was trained on pre-existing text written by humans.

Some of the pretraining. Other pretraining is on text written by AI. Human training data is only but a subset of what these models train on. There is a ton of synthetic training data now.

by charcircuit

3/19/2026 at 4:18:08 AM

Exactly. The real speed up from AI will come when we can under specify a system and the AI uses its intelligence to make good choices on the parts we left out. If you have to spec something out with zero ambiguity you’re basically just coding in English. I suspect current ideas around formal/detailed spec driven development will be abandoned in a couple years when models are significantly better.

by systemsweird

3/19/2026 at 5:26:12 AM

This is humans have traditionally done with greenfield systems. No choices have been made yet, they're all cheap decisions.

The difficulty has always arisen when the lines of code pile up AND users start requesting other things AND it is important not to break the "unintended behavior" parts of the system that arose from those initial decisions.

It would take either a sea-change in how agents work (think absorbing the whole codebase in the context window and understanding it at the level required to anticipate any surprising edge case consequences of a change, instead of doing think-search-read-think-search-read loops) or several more orders of magnitude of speed (to exhaustively chase down the huge number of combinations of logic paths+state that systems end up playing with) to get around that problem.

So yeah, hobby projects are a million times easier, as is bootstrapping larger projects. But for business works, deterministic behaviors and consistent specs are important.

by majormajor

3/19/2026 at 5:51:40 AM

> in a couple years when models are significantly better.

They aren't significantly better now than a couple of years ago. So it doesn't seem likely they will be significantly better in a couple of years than they are now.

by bigstrat2003

3/19/2026 at 9:39:31 AM

A couple years ago we didn't even have thinking. AI could barely complete a line of code that you working on and now they are capable of long running tasks like building an entire C compiler.

by charcircuit

3/19/2026 at 5:11:00 AM

For now I would be happy if it just explored the problem space and identify the choices to be made and filtered down to the non-obvious and/or more opinionated ones. Bundle these together and ask the me all at once and then it is off to the races.

by macinjosh

3/19/2026 at 6:25:56 AM

Here is an easy example:

Say you and I both wrote the same spec that under-specifies the same parts. But we both expect different behavior, and trust that LLM will make the _reasonable_ choices. Hint: “The choice that I would have made.”

Btw, by definition, when we under-specify we leave some decisions to the LLM unknowingly.

And absent our looks or age as input, the LLM will make some _reasonable_ choices based on our spec. But will those choices be closer to yours or mine? Assuming it won’t be neither.

by ozozozd

3/19/2026 at 8:03:52 AM

And I'll say it doesn't matter. If it goes with your approach then I'll say that the approach works too. And if goes with my approach I hope you would recognize that my approach also is good enough for what we need. As long as the software meets the requirements, its okay if it isn't implemented the exact way it would have been if it was done by hand.

by charcircuit

3/19/2026 at 4:18:02 PM

> As long as the software meets the requirements

"Requirements" are a spec. If your requirements lack detail, you probably won't get what you expected -- vague "requirements" can be met without solving your problem in many cases.

by bigfishrunning

3/19/2026 at 4:37:55 PM

It's okay if you don't get what you expect as long as it works. That's the point. There's a ton of different valid permutations and letting a LLM pick a good one ends up working in practice.

by charcircuit

3/19/2026 at 9:09:52 AM

Works until the first weird bug shows up and now nobody knows what the code is doing. "I don't care how this part works" turns into tracing some "reasonable" choice through the repo until you find the API edge case and the config key that only exists because the model improvised.

by hrmtst93837

3/19/2026 at 4:38:41 AM

Exactly, I find that type of article too dismissive. Like, we know we don't have to write the full syntax of a loop when we write the spec "find the object in the list", and we might even not write this spec because that part is obvious to any human (hence to an LLM too)

by TheRoque

3/19/2026 at 5:44:33 AM

Absolute nonsense. A sufficiently detailed "spec" is the code. What is wrong with you people ? Pure nonsense, all they have to offer.

by sjeiuhvdiidi

3/19/2026 at 8:39:52 AM

Please don't fulminate on HN. The guidelines make it clear we're trying for something better here. https://news.ycombinator.com/newsguidelines.html

by tomhow

3/19/2026 at 6:16:17 AM

Code is usually over specified. I recently used AI to build an app for some HS kids. It built what I spec’wd and it was great. Is it what I would’ve coded? Definitely not. In code I have to make a bunch of decisions that I don’t care about. And some of the decisions will seem important to some, but not to others. For example, it built a web page whereas I would’ve built a native app. I didn’t care either way and it doesn’t matter either way. But those sorts of things matter when coding and often don’t matter at all for the goal of the implementation.

by kenjackson

3/19/2026 at 3:00:36 PM

If you think of yourself as a "programmer" or "coder" I think it's fair to say you might have incorrectly defined your role already. Your job is to design and implement software solutions that are reliable and compliant, and make business processes more efficient. Writing code should really be a relatively small piece of the puzzle, and arguably if you're spending a lot of time writing code, you're wasting time.

Writing code has always been a means to an end of getting the job done, and I consider being really interested in the practice of coding to be a bit odd. It's as if a carpenter got really into manual saws and got upset about an electric power saw because he "loves manual saws". Whether or not an AI spec is "code" really shouldn't matter if you're a software professional. You are designer/engineer of software systems, not a "coder".

by silent_cal

3/19/2026 at 3:25:32 PM

Not quite.

Let’s first agree that code is the final output of a software engineer, and that a wooden object is the final output of a carpenter.

A correct analogy would be, if a software engineer’s AI assisted output is reduced to assembling auto generated code snippets into a workable product, a carpenters is assembling ikea furniture.

The only missing piece is liability. The engineer and carpenter are both hired to assemble something, so they better damn well fully understand what they are delivering because they are being paid to own the liability if anything fails.

by jarjoura

3/19/2026 at 7:05:44 PM

this feels to me like "read the room" "not a good look" "don't you know it's uncool to be into that?"

by contextfree

3/19/2026 at 3:13:40 PM

From 2011, a timeless classic: https://www.kalzumeus.com/2011/10/28/dont-call-yourself-a-pr...

by enraged_camel