Learnings from 100K lines of Rust with AI (2025)

5/20/2026 at 11:57:48 AM

We're working on a large Rust codebase, heavily assisted development with Claude and Codex, and one critical workflow is after you have written a spec, have the other LLM critique it thoroughly.

This back and forth will take quite a while, but the resulting implementation plan will be 10x better than the original.

You can automate this by giving Codex a goal, and a skill to call Claude to review the implementation spec until they both agree it's done.

Then, for critical code, have them both implement the spec in a worktree, then BOTH critique each other's implementation.

More often than not, Claude will say to take 2 or 3 pieces from it's design over to Codex, but ship the Codex implementation.

by chadd

5/20/2026 at 1:13:13 PM

I take this idea even further: After the LLMs have critiqued each other, I introduce a third critique and review it myself as a human. This third party review is most effective at highlighting problems that the LLMs miss, in my experience.

Jokes aside, I agree about having LLMs iterate. Bouncing between GPT and Opus is good in my experience, but even having the same LLM review its own output in a new session started fresh without context will surface a lot of problems.

This process takes a lot of tokens and a lot of time, which is find because I’m reviewing and editing everything myself during that time.

by Aurornis

5/20/2026 at 1:50:25 PM

This is astrology for devs.

by knivets

5/21/2026 at 9:00:00 AM

as someone who is about as llm-forward as anyone out there, this is a brilliant analogy. was equally true of all the “prompt engineer” hype as well from a couple years ago (which i admit i still think does matter)… it kinda makes me feel like an audiophile / hi-fi person talking about how 24bit/192kHz is the one true encoding format and anything less is a willfull (cynical, “Quality”-hating, satisficerist, etc.) compromise. which i freely admit to being one of those people as well.

and in both cases i both “know” that i can tell the difference and “know that i cannot tell the difference”. what anyone takes from that in terms of what it says about me, personally, is a bit of a Rorschack test, but Astrology is about as apt a description as there is… xD

by keeganpoppen

5/21/2026 at 9:46:28 AM

For higher than audible frequency sample rates there's a good chance you can tell the difference. It often causes weird aliasing and harmonics in the more audible frequencies on "real" playback equipment. You can train yourself to recognize some of these and often pretty accurately identify the higher sample rate examples. You might even mentally associate those signs with "Higher Quality".

But it's arguably less accurate to the original recording.

by kimixa

5/21/2026 at 10:54:04 AM

People though asking LLM to output the reasoning steps was astrology until it's standardized and made ubiquitous.

by raincole

5/21/2026 at 2:22:13 PM

Didn't multiple studies find the reasoning traces didn't have much to do with the final output? And even that outputting placeholder tokens during reasoning has a similar beneficial effect on benchmark scores?

(I don't think that's the full picture but, there's definitely something fishy going on there.)

by andai

5/21/2026 at 8:24:38 AM

Do they have a golden calf to dance around? Without that success will be hit and miss.

by soloto

5/21/2026 at 9:01:19 AM

i mean, maybe the golden calf people were right the whole time lol

by keeganpoppen

5/21/2026 at 9:33:01 AM

Right about what?

by Pay08

5/20/2026 at 4:36:42 PM

Unless you can somehow provide some arguments against it, I feel like you're the one who is trying to cargo-cult stuff here.

Say what you will with proper reasoning or arguments if you feel compelled, tired reddit-commentary like that helps no one.

by embedding-shape

5/21/2026 at 9:04:27 AM

i legitimately cannot divine what you are saying at all with this. there are so many dangling antecedents and modifiers that it is completely impossible. and i say this out of a genuine desire to understand what your argument is, knowing full well that i likely disagree with it.

by keeganpoppen

5/21/2026 at 11:36:52 AM

You can't be serious. It couldn't be more obvious what the poster was referring to, a drive by put-down comment with no attempt to discuss anything seriously is more highly upvoted than an objection to such a comment.

What is this place for? Dang tells us, curious discussion. The guidelines explicitly state that certain comments are not in the spirit.

But the community seems to have decided otherwise, which is a shame.

by munksbeer

5/21/2026 at 11:46:06 AM

Don't read too much into it, downvotes/upvotes are highly random here, saying the same thing twice will have different reactions depending on the time of day and the topic of the submission, seems certain crowds are drawn to certain topics, which isn't that surprising.

I don't mind the downvotes, the points aren't really the reason I'm here anyways, I just want fun and interesting discussions with people and read other's perspectives, the points don't hinder that :)

by embedding-shape

5/21/2026 at 11:19:26 AM

Alright, let me explain, hopefully simpler: GP made told us their experience with working with LLMs, and some pointers to what they found to be working. The comment I replied to just says "This is astrology for devs" which basically is a cheap putdown without any reasoning nor arguments for why the commentator believes so. My comment is urging them to actually participate in the discussion, not just post their soundbite they thought of in five seconds, so HN as a whole can remain good instead of devolving into reddit (which is a tale as old as HN, I know).

Hopefully it's understandable now, and hopefully you don't disagree :)

by embedding-shape

5/21/2026 at 1:29:16 PM

https://news.ycombinator.com/newsguidelines.html

> Please don't post comments saying that HN is turning into Reddit. It's a semi-noob illusion, as old as the hills

by beepbooptheory

5/21/2026 at 3:05:48 PM

Awesome, you did understand the reference I made, I was afraid I was too sneaky about it but seems it was just clear enough :)

by embedding-shape

5/21/2026 at 3:10:29 PM

Of course! That point in the guidelines has links to some prior art in this vein. Highly recommend it for you.

And please, do better next time!

by beepbooptheory

5/21/2026 at 4:39:53 PM

Whenever I make joke reference to the guidelines, I do promise I'll attach a link to them, just to make it extra clear, thanks! :)

by embedding-shape

5/21/2026 at 1:56:34 PM

Indeed, with the corollary of, please don't write Reddit-tier comments on HN either, then one wouldn't have to say it's turning into Reddit.

by satvikpendem

5/21/2026 at 3:30:32 PM

Two wrongs don't make a right, as they teach us as children.

by beepbooptheory

5/20/2026 at 7:20:08 PM

> Unless you can somehow provide some arguments against it,

We're year 4 into this discussion and camps have only gotten more bifrucated. There's no 1-1 discussion to have about this as of now, at least not before the crash.

Your only hope in such discourse is not trying to convince the other party how wrong they are, but appealing to an as of yet undecided party. Be it with reason, or simply pointing out how absurd some comments sound to the average person.

by johnnyanmac

5/20/2026 at 7:26:33 PM

> Your only hope in such discourse is not trying to convince the other party how wrong they are

I don't care about convincing anyone, the ones I reply to or others, but if you take the time to leave a comment, at least make it something to read and think about instead of soundbites like "This is astrology for devs", it's plain boring to read and makes HN worse.

by embedding-shape

5/20/2026 at 7:34:45 PM

>I don't care about convincing anyone

That's fine. Others will care for you.

>it's plain boring to read and makes HN worse.

I chuckled at the joke. Surprising amount of layers to it.

Though I never strove to be a comic nor writer, that kind of terse, compact punch makes me envy those of such literary talent.

by johnnyanmac

5/21/2026 at 11:20:45 AM

> I chuckled at the joke. Surprising amount of layers to it.

What joke?

by embedding-shape

5/20/2026 at 1:43:34 PM

This is precisely how I used to use Beads before I made GuardRails (I wanted something slightly simpler, but similar with more 'guard rails'). I braindump everything I want to build, I ask Claude to do market level research. I then ask Claude to ask clarifying questions, when I ask Claude to be critical of its conclusions and provide the top options and to justify it. I also question Claude and say its okay to disagree with me, be critical, I just want to understand.

By the end you have piecemeal "tickets" for your coding agent, if you have multiple developers you can sync them all up into github, and someone could take some locally, or you can just have Claude work on all of them with subagents. The key feature there is because its all piecemeal the context stays per task.

Then I run a /loop 15m If you're currently working ignore this. Start on the next task in gur if you have not. If you finished all work and cannot pass one gate, work on the next available task.

(Note: gur is my shorthand for GuardRails)

I also added a concept called "gates" so a task cannot complete without an attached gate, gates are arbitrary, they can be reused but when assigned to a task those specific assignments are unique per task. A task is basically anything you want it to be: unit test, try building it, or even seek human confirmation. At least when I was using Beads it did not have "gates" but I'm not sure if it has added anything like it since I stopped using Beads.

Claude will ignore the loop if it's currently working, and when its "out of work" it will review all available tasks.

If anyone's curious its MIT Licensed and on GitHub:

https://github.com/Giancarlos/guardrails

by giancarlostoro

5/21/2026 at 6:15:09 AM

I’ll check this out. I might integrate it in to my IDE (www.propelcode.app) as a complement to plan mode.

by digitaltrees

5/21/2026 at 9:05:36 AM

oh man my body is ready for any post-beads ideas… i will definitely check this out

by keeganpoppen

5/20/2026 at 12:19:35 PM

I hate how seriously people take the output of an LLMs or how reliable they think it is.

Have Claude produce that spec 10 times, use the same prompt and same context. Identical requests, but you'll get 10 unique answers that wil contradict each other with each response seeming extermely confident.

Its scary how confident you people are in these outputs.

by ai_fry_ur_brain

5/20/2026 at 12:30:30 PM

If you ask 10 different humans to produce the spec with the same information (prompt and context) they will also produce 10 unique answers that will contradict each other and (depending on who you asked) may be just as confident.

There are real decisions to be made when going from a vague prompt to a spec. It's not surprising that an LLM would produce different specs for the same work on different runs. If the prompt already contained answers to all the decision points that come up when writing the spec then the prompt would already be the spec itself.

by CrazyStat

5/20/2026 at 12:36:26 PM

LLMs aren't people. They don't reason. They're token generators, a black box. Your analogy falls on its face with any scrutiny.

by b40d-48b2-979e

5/20/2026 at 1:01:40 PM

I didn’t claim that LLMs are people or that they reason.

If the behavior of the llm is the same as the behavior of reasonable people then the behavior of the llm is reasonable, regardless of how black of a box they generate tokens out of.

Reasonable people will generate divergent specs for the same prompt. Thus it is reasonable for an LLM to generate divergent specs out of the same prompt.

Edit: I use “reasonable” here in the legal sense of the “reasonable person” standard, not to imply any reasoning process.

by CrazyStat

5/20/2026 at 1:10:35 PM

[flagged]

by b40d-48b2-979e

5/20/2026 at 1:15:37 PM

Please point to where in my initial comment I indicated that LLMs are human or reason.

If you are unable to do so please withdraw your accusation of gaslighting, a serious form of psychological abuse, and apologize.

by CrazyStat

5/21/2026 at 6:26:02 AM

Aren’t people pattern matching neural networks as well? Why does being a token generator mean something is unreliable?

Further, why does that mean “it doesn’t reason”. Logic can be encoded in language, symbols or code. If I say “all apples are red” -> “all fruit in the bowl are apples” -> “therefor all the fruit are red”. It doesn’t really matter if I understand the logic or what red is or fruit/apples are, the logic is contained in the structure of the syntax. If an LLM can output the conclusion reliably from predictive operations it is able to have the effect of reason and we don’t need to know or care about whether it “understands” the reasoning.

by digitaltrees

5/21/2026 at 9:13:54 AM

no, brah, humans are TOTALLY different. just don’t think about it too hard. we are just special.

by keeganpoppen

5/21/2026 at 9:12:32 AM

why do people insist on claiming that they don’t reason, when they clearly, for all intents and purposes, do. you can be vague; you can express your idea a thousand different ways, and you will get a unique blend of <your input bits> x <hidden reasoning layer> => semi-smoothed output. this is like some Searle Chinese Room bullshit that needs to just die. it is beyond clear that llms can interact with abstract concepts in an extremely meaningful way. this is like the “thought leader” version of the stupid-ass “it’s just smart autocomplete” argument. if you think that, it is user error— either a failure of creativity or a failure of perception or both. just because llms are not a panacea and are problematic for society and “overhyped” and whatever does not make it intellectually honest to claim that there is zero reasoning/creativity/cognition within the box.

by keeganpoppen

5/20/2026 at 12:43:48 PM

it's an analogy, it didnt fall on its face at all. it's just a comparison to highlight the point being made was nonsensical. example: you're just a next action generator controlled by trillions of cells and subconscious dna-based behavior. a black box.

by jatora

5/20/2026 at 1:02:50 PM

> you're just a next action generator controlled by trillions of cells and subconscious dna-based behavior.

With moral agency and the ability to learn (even if we presume you are correct, which I don't think you are).

by svieira

5/20/2026 at 2:56:47 PM

moral agency and the ability to learn are implicit in the description you quoted. this isn't some special superpower, all animals have the ability to learn, and many have moral agency. these aren't human specific traits

by jatora

5/20/2026 at 12:53:02 PM

Reductio ad absurdum.

by b40d-48b2-979e

5/20/2026 at 2:54:20 PM

exactly my point lol

by jatora

5/20/2026 at 9:42:32 PM

It appears they don't need to reason or be intelligent to be able to produce working solutions for code. But sure let wild and unmonitored? I wrangle my LLMs like the code monkeys they are. They help materialize code and then you need to sculpt it (and test harness of varying sorts)

It really can be useful. It's very different from old world programming.

by NobleLie

5/20/2026 at 1:02:09 PM

LLMs do reason (they just sometimes don't reason well).

I assure you I've met many devs and "engineers" that reason less than LLMs, and are black boxes, especially in terms of the code they write.

by dnautics

5/20/2026 at 5:46:17 PM

> LLMs do reason

No, they don't.

They are token predictors that use statistical techniques to emit the randomly weighted next most likely token given the previous token list.

The result is a strange mimic of human reasoning, because the tokens it predicts are trained on strings that were produced by humans that were reasoning, but that's not the same thing.

Human cognition is complex and poorly understood, and the nature of the mind is an area of study almost as old as consciousness itself. We don't know exactly how it works, or what its exact relationship to the brain is, but we do know that it is not a simple token predictor.

LLMs, by their very nature are constrained to the concept of language and the relationship between existing words in a corpus. This is a box they can not escape.

Modern neuroscience suggests that the human brain is much more vast than that, and in many ways looks like it is constrained by language, but certainly not limited to it.

by claytongulick

5/20/2026 at 9:22:54 PM

You have moved goalposts from reasoning to "human cognition". I won't tolerate that sort of slippery wordplay.

Reasoning is making analogies between logical patterns found in conceptual space, with a direction of time (statements precede conclusions). For example. A => B and B => C. You may now deduce A => C. For something fuzzier, A~D and B~E, you may now deduce that D~=>E. This is the sort of thing that higher layer attention mechanism is capable of doing.

> This is a box they can not escape.

Would you say that Helen Keller was less capable of abstract reasoning because she had more constrained access to sensory input?

by dnautics

5/20/2026 at 9:45:34 PM

The problem with that is LLMs can output words or symbols that seen like it used "reason" to produce. But for everything the core algorithm does, it's simply nothing like the wetware reasoning to get to the same answer. So he didn't move goalposts. He always meant the reasoning that stems from human cognition.

Technically if it has that, it'd be singularity no? So basically the premise is they are doing nothing of the sort. Prove any LLM enough and it really does show it has no quarrels contradicting itself or being bossed around. Has no belief / no orientation etc. It's truly mindless but tricks our mind and soul (or whatever) probably.

by NobleLie

5/20/2026 at 11:03:20 PM

> Technically if it has that, it'd be singularity no?

reasoning is not black and white. It is possible to reason poorly. Most people cannot do basic math proofs, even math majors struggle with the hardest math proofs. Reasoning in humans is also context/token dependent. I just spent one HOUR trying to show my mom (who has mild dementia) how to use amazon fire (push DOWN until your channel shows up, push RIGHT until the channel becomes big) and she could not figure it out. Rewrote the instructions in japanese and she followed the logic relatively smoothly. Ironically, i'm pretty sure her english is better than her japanese, vocabulary wise.

> it's simply nothing like the wetware reasoning to get to the same answer.

but you don't know how wetware reasoning works, so you are incapable of making that proclamation. I'm pretty sure when I do math proofs (I'm not an amazing mathematician) sometimes I have to literally tick my way through each step of the proof, sometimes breaking it down to super-basic substeps, which to me feels awful lot like what an LLM could be doing. For that matter we don't know how LLM reasoning works but my claim is that these LLMs are in principle capable of reasoning due to architecture.

If this doesn't make sense I suggest you look over the architecture of LLMs carefully and try to understand my point.

(BTW I'm not talking about "reasoning models" with "thinking turns", that's just marketing speak, I'm talking about ANY transformer-based model, even the "dumbest UX architecture" completion models)

by dnautics

5/21/2026 at 6:40:37 AM

Humans off load reasoning into language and syntax. Chinese encodes arithmetic into the grammar/syntax patterns better than French for example.

Your posts are generally insightful. Thanks for the contribution. Even if it’s a bit cranky and gruff :)

by digitaltrees

5/21/2026 at 12:32:09 AM

Reasoning requires cognition, otherwise there's nothing to reason about, no context or value system to use as a basis for reason.

Decision making can be done by trained machines following rules, but that's different that reasoning. A thermostat isn't reasoning when it decides to turn on the air conditioner, to argue otherwise expands the definition of "reason" to be so broad that it becomes useless.

LLMs are trained on human knowledge and reasoning that results from human cognition, and they are excellent at stochastic mimicry - if the argument is that they are actually reasoning, then some sort of equivalent to human cognition must be present for that to be true. Lacking that, they are nothing more than "token extrusion machines" with some potentially useful characteristics.

by claytongulick

5/21/2026 at 6:50:38 PM

Can you give a concrete example of something that is impossible for an LLM to ever do due to its lack of reasoning ability.

by Jtarii

5/21/2026 at 6:43:38 AM

Why does reasoning require cognition? Isn’t a if else block or switch statement reasoning? Or a formal logic proof? If an LLM produces an output using formal logic or a python script why is that not reasoning? A human would offload the reasoning using similar methods. I know when I took the LSAT, I learned ways to diagram arguments and didn’t have to think/reason about it because the formal logic diagram did the “reasoning for me”.

Aren’t humans just “action potential” extrusion machines? What is unique about our neural pattern recognition to make our cognition different in nature rather than merely degree?

It seems clear at this point that the greatest insight that unlocked our current AI acceleration was scaling alone would unlock emergent properties and abilities.

by digitaltrees

5/21/2026 at 6:28:44 AM

The structure of language encodes logic in many ways. So the models ability to reason may be an emergent property of the reasoning ability humanity has ejected an extracted from our neural networks and abstracted into language a symbols.

by digitaltrees

5/21/2026 at 9:15:34 AM

there is absolutely no line of demarcation between human reasoning and what you described

by keeganpoppen

5/21/2026 at 5:48:10 AM

> They are token predictors that use statistical techniques to emit the randomly weighted next most likely token given the previous token list.

Sounds like an implementation detail. Now describe how human reasoning works and explain why that process of chemical and electrical signals results in "reasoning" whereas what LLMs do isn't.

The problem with being this reductive is you can do it to anything, including humans. You can’t be reductive about LLMs and refuse to be reductive about humans - that's poor reasoning, and an LLM would out-reason you on this point, further negating your case.

by antonvs

5/21/2026 at 5:27:05 PM

Human cognition is poorly understood and much more complex than it seems.

For an example, look at some of Julia Mossbridge's work.

If even a small part of her work is true and valid, it points to something far outside our current framework.

You don't need to go as far afield as Mossbridge, though - that's an extreme example. Pretty much any modern neuroscience will make you question a lot of assumptions, at least it did for me.

by claytongulick

5/21/2026 at 6:31:35 AM

[dead]

by digitaltrees

5/20/2026 at 9:52:26 PM

Wow, there are still people trying to claim they don't reason. What will they have to do before you'll admit that they can?

by IshKebab

5/21/2026 at 9:33:22 AM

You are asking the wrong question. It's not about if you can do X which can be faked especially if you are given practically infinite tries and all failures are hidden.

The people who want to believe they actually reason just ignore all obvious evidence of contrary and cherry pick the times reasoning was faked well enough.

The people who don't want to believe will just take a second to understand how they work and then come up with ways to reveal they were faking all along. Like asking how many letters there are in a word lol.

It's only the people who don't want to believe that count because reality is what happens despite of what you believe.

by esailija

5/21/2026 at 6:55:07 PM

It will be interesting to see the excuses people come up with when LLMs innevitably start solving millenium prize problems.

by Jtarii

5/21/2026 at 11:29:21 AM

You seem to believe that something is only "reasoning" if it works in a particular way. That it's not enough for it to observationally display reasoning skills; it has to be using a particular method to do that so it's not "faking" it. Is that correct?

by IshKebab

5/20/2026 at 2:37:34 PM

They very obviously reason.

by Jtarii

5/20/2026 at 4:24:06 PM

it's kind of crazy to think that the transformer architecture can't encode some primitive form of reasoning.

by dnautics

5/20/2026 at 12:37:37 PM

An LLM should not "generate specs", a human should. The LLM can work from the specs. It can never infer meaning from a vague prompt. If so, it will start guessing. Every human that ever did functional specification or information analysis at some point knows this. Or has learned the hard way, something with assumptions and asses ;)

by olafmol

5/20/2026 at 1:20:25 PM

The guessing of a LLM for a vague prompt is better than the one of your average developer.

A prompt like "write these two files on disk" will very likely make the LLM do some sort of an atomic write/swap operation, unlike the average developer which will just write the two files and maybe later encounter a race condition bug. You can argue the LLM output is overkill, but it will also be more robust on average.

by dist-epoch

5/20/2026 at 8:52:08 PM

What kind of race condition do you have in mind?

by rixed

5/20/2026 at 6:28:11 PM

> It's not surprising that an LLM would produce different specs for the same work on different runs This is what I don't understand: AI is a computer program with its own data. If we give the same input to that computer program every time, why does it produce different outputs every time? Or does the input include LLM data + our prompt + some random data that computer program picks from its Internet search?

by dxxvi

5/20/2026 at 7:02:44 PM

LLMs have a temperature parameter. At zero temperature they are deterministic: they always choose the most likely next token at each step based on what came before and the model weights, and they will always generate the same output given the same input.

As you raise the temperature they will start (pseudo)randomly choosing tokens other than the single most likely token (though that one will still be the most likely to be chosen). It turns out this is almost always better than zero temperature, which has a tendency to get caught in repetitive loops. I imagine all the frontier labs have spent thousands (millions?) of CPU hours tuning the temperature parameters on their models for optimal performance.

by CrazyStat

5/20/2026 at 8:35:14 PM

  > LLMs have a temperature parameter. At zero temperature they are deterministic: they always choose the most likely next token at each step based on what came before and the model weights, and they will always generate the same output given the same input.

https://en.wikipedia.org/wiki/Softmax_function

"A value proportional to the reciprocal of β is sometimes referred to as the temperature: β = 1/kT, where k is typically 1 or the Boltzmann constant and T is the temperature. A higher temperature results in a more uniform output distribution (i.e. with higher entropy; it is "more random"), while a lower temperature results in a sharper output distribution, with one value dominating."

"Temperature" in the context of softmax does not change a "winning" token, it changes how much probable (in the sense of softmax distribution) winning token will be. If the winning token is "New York", it will be a winner with temperature close to 0 and with temperature of 1e9.

The actual selection of the random token is done separately by using inputs outside of the softmax distribution, for example, by using random number generator. I believe most of LLM configs have a seed for the random number generator.

More than that, generation of code in most programming languages is done with the more guardrails such as beam search guided by schema, syntax and semantics.

by thesz

5/20/2026 at 9:46:44 PM

Nah. Even with zero temperature this is still variation.

by NobleLie

5/20/2026 at 7:27:02 PM

The issue is Lllms don't learn, despite the name. A human re-implementing a spec would strive to iterate towards what they feel is a better spec. They can take in their own input and self-correct. The work of implementing the spec gives insight into pain points and strengths, even if they never actually test the spec (they 100% should, but this is to emphasize that struggle for humans is in itself iteration, even before external feedback comes in).

An LLM is isn't deterministic but also isn't iterative without an existing human. You give it the same spec 10 times and it produces 10 results that aren't far off itself but vastly different when you go into the weeds. And not different in a way of improvement. |

by johnnyanmac

5/20/2026 at 12:41:37 PM

So what’s most important is knowing those parameters and the ranges of values, not having the final result. A human, after producing a specs, can the provide the mental model of how he created the specs. Where the inflection points are and what the range of valid results.

What has always mattered is how you decide the specs, not the specs in themselves.

by skydhash

5/20/2026 at 5:30:14 PM

> If you ask 10 different humans to produce the spec with the same information (prompt and context) they will also produce 10 unique answers

But they didn't ask humans, they asked a machine. We expect our machines to behave in predictable ways.

> If the prompt already contained answers to all the decision points that come up when writing the spec then the prompt would already be the spec itself.

This is one of the best arguments against using LLMs I've seen.

It reduces to the classic argument- at the point where you've described a problem and solution in sufficient detail to be confident in the results, you've invented a programming language.

by claytongulick

5/20/2026 at 7:18:57 PM

> We expect our machines to behave in predictable ways.

I expect LLMs to produce randomly varying output. Maybe it's the thousands of hours I spent doing monte carlo simulations for my PhD.

> This is one of the best arguments against using LLMs I've seen.

> It reduces to the classic argument- at the point where you've described a problem and solution in sufficient detail to be confident in the results, you've invented a programming language.

I'm not an LLM true believer, but I use codex for various small tasks and it often (not always) does a thoroughly decent job. Yesterday I gave it a pretty vague request to set up a new Home Assistant dashboard and it handled it just fine--I told it what I wanted to see but it figured out itself which helper variables it would need to set up to realize that vision and wrote all the config for it.

I probably could have done it in 15 minutes if I was familiar with Home Assistant's yaml configuration schema and all, but I'm not so it probably would have taken me closer to an hour. Asking codex took me 30 seconds and it did just fine.

I am skeptical that LLM's are going to kill all white collar jobs or whatever anytime soon. Not being able to truly learn things is an issue. Reality has a surprising amount of detail[1], and while codex does well at things like writing Home Assistant configs and setting up a Minecraft server, where there are thousands of examples online of how to do it, when I've asked it to do some more esoteric things it has sometimes failed spectacularly. I don't think having the LLM keep notes and then read them back (filling up the context window) is a real solution here.

[1] http://johnsalvatier.org/blog/2017/reality-has-a-surprising-...

by CrazyStat

5/21/2026 at 12:51:21 AM

I haven't made the argument that LLMs aren't useful, I can see cases where they are.

I don't think they include areas where correctness, determinism or human reasoning are important.

At least, not in isolation.

by claytongulick

5/21/2026 at 6:18:49 AM

But those differences fall within a band of generally accepted results don’t they? And the cost to throw the code away and reimplement is low now. So maybe it doesn’t really matter if the implementation is perfect or identical.

That being said I agree people trust AI too much. Especially people with less experience. It’s easy to forget the models are mirrors of we are as the drivers of the input context not mentors that will guide us to best practices reliably.

by digitaltrees

5/20/2026 at 12:47:59 PM

[dead]

by nullsanity

5/20/2026 at 12:49:23 PM

[flagged]

by jatora

5/20/2026 at 12:54:15 PM

Imagine making this your entire identity

by Robdel12

5/20/2026 at 12:17:41 PM

I strongly believe you don’t need to call another model for that. The same model can do result fine. Just not as part of the same context.

I mean that if you ask codex on gpt 5.5 to submit to a plan reviewer subagent that uses gpt5.5, this is enough to have a very good reviewing and reassessment of the plan.

My hypothesis is that it’s even better than opus.

The reason why submitting the product of one LLM to another to review is that you need a fresh trajectory. The previous context might have “guided” the planer into some bias. Removing the context is enough to break free from that trajectory and start fresh.

by motoboi

5/20/2026 at 8:06:13 PM

[dead]

by dimitrismrtzs

5/20/2026 at 1:08:16 PM

The return of pair programming.

by AnimalMuppet

5/20/2026 at 5:31:32 PM

It's incredible how much developers will do to avoid having to look at or think about code.

by slopinthebag

5/21/2026 at 8:13:59 AM

What is incredible is that these people have the gall to call themselves developers.

by lstodd

5/21/2026 at 6:34:52 AM

>We're working on a large Rust codebase, heavily assisted development with Claude and Codex, and one critical workflow is after you have written a spec, have the other LLM critique it thoroughly.

I do this with other languages, too, not just Rust. Thing is, you have to put a hard stop at some point because the models will always find something to nitpick.

by DeathArrow

5/20/2026 at 10:31:56 AM

>Testing is the first layer of defense. My system now includes 1,300+ tests — from unit tests to minimal integration tests (e.g., proposer + acceptor only), all the way to multi-replica full integration tests with injected failures. See the project status.

I know LOC is a silly metric, but ~1300 tests for 130k lines averages out to a test per 100 lines - isn't this awfully low for a highly complex piece of code, even discounting the fact that it's vibecoded? 100 LOC can carry a lot of logic for a single test, even for just happy paths.

by torben-friis

5/20/2026 at 11:42:03 AM

Considering the domain being distributed systems, and aiming to implement "a Rust-based multi-Paxos consensus engine that not only implements all the features of Azure’s Replicated State Library (RSL)", I don't think we even have to look so deep into it, it's severely lacking tests.

If you're building a distributed system and you don't have more tests and testing code than actual code, by an order of magnitude most likely, then you're missing test coverage.

by embedding-shape

5/20/2026 at 10:40:22 AM

IIUC only 50k LoC are non-test code, which improves the metric. Whether that's enough tests still depends on the code. If most are getters and setters, the coverage might be ok.

by kawogi

5/20/2026 at 10:40:04 AM

I may have missed it but are those tests written by person or generated? Otherwise how do you know they even test anything (like actually test, not appear to test)

by risyachka

5/21/2026 at 11:02:59 AM

Ask AI to find the weakest tests. :)

No joke: it works for me. I have a 45kLOC prod code (just code, no comments, no blanks), tested by a 30kLOC test code containing 1600 tests (that run in 30secs).

I helped with the test infrastructure/architecture. Sometimes I had to write the first few tests of a particular kind, but now Claude TDDs for me.

A fair share of my CLAUDE.md instructs in how I like my tests, when to write them (first), different types of tests (unit, faked-services, db, e2e, etc.)

Asking Claude to find weak tests has helped a lot in getting here. I also do review AI-gen'd code, pretty much line-by-line, before accepting it.

by flossly

5/21/2026 at 7:28:42 AM

It’s all written by AI and you can’t tell for sure if the tests are good. You can eyeball some but eyeballing 50k lines of code takes a lot of time. You just trust AI and YOLO, find errors later

by sashank_1509

5/20/2026 at 10:53:07 AM

I'm also shifting to an vibe coding workflow, but I have a genuine question: whenever I use AI for Rust, it makes an insane amount of lifetime errors. I have no idea how people are churning out so many lines of code so quickly.

Honestly, despite all the hype around Rust in the community, the fact that AI can't handle lifetimes reliably makes me reluctant to use it. The AI constantly defaults to spamming .clone() or wrapping things in Rc, completely butchering idiomatic Rust and making the output a pain to work with.

On the other hand, it writes higher-level languages better than I do. For those succeeding with it, how exactly are you configuring or prompting the AI to actually write good, idiomatic Rust

by jdw64

5/20/2026 at 11:12:51 AM

> I'm also shifting to an vibe coding workflow, but I have a genuine question: whenever I use AI for Rust, it makes an insane amount of lifetime errors. I have no idea how people are churning out so many lines of code so quickly.

What harness and model you've been using? For the last few months, essentially since I did the whole "One Human + One Agent = One Browser From Scratch" experiment, I've almost exclusively been doing cross-platform native desktop development with Rust, currently with my own homegrown toolkit basically written from scratch, all with LLMs, mostly with codex.

But I can't remember a single time the agent got stuck on lifetime errors, that's probably the least common issue in regards with agents + Rust I come across. Much bigger issue is the ever-expanding design and LLMs being unable to build proper abstractions that are actually used practically and reduces the amount of code instead of just adding to the hairball.

The issue I'm trying to overcome now is that each change takes longer and longer to make, unless you're really hardcore about pulling back the design/architecture when the LLM goes overboard. I've only succeeded in having ~10 minute edits in +100K LOC codebases in two of the projects I've done so far, probably because I spent most of the time actually defining and thinking of the design myself instead of outsourcing it to the LLM. But this is the biggest issue I'm hitting over and over with agents right now.

by embedding-shape

5/20/2026 at 11:59:23 AM

Have you split your 100k loc codebases into smaller crates? If you take a look at eg gitoxide's repo, they've split it in many smaller crates. I think that might help with keeping the scope for the ai small and maybe help with keeping contracts tight and well-defined.

by tomtom1337

5/20/2026 at 12:20:01 PM

Yes, that absolutely helps (and yes, doing that :) ), I'm going even further and basically hard-enforcing a LOC limit per file too, which helps a lot as well.

The complexities LLMs end up putting themselves in is more about the bigger architecture/design of the program, rather than concrete lines, where things end up so tangled that every change requires 10s of changes across the repository, you know, typical "avoid the hairball" stuff you come across in larger applications...

by embedding-shape

5/20/2026 at 12:46:56 PM

> basically hard-enforcing a LOC limit per file too, which helps a lot as well

this. create pre-commit hooks that enforce project conventions, code quality checks, and regression testing. it saves you so much headache

by jatora

5/20/2026 at 1:14:37 PM

[flagged]

by Tval

5/20/2026 at 10:59:21 AM

A lefthook:

format: glob: ".rs" run: cargo fmt -- --check

lint: glob: ".rs" run: cargo clippy -- -D warnings

tests: run: cargo test

audit: run: cargo audit

+ hooks that shove the lefthook automatically in the ai's face

---

rustfmt.toml:

edition = "2021" newline_style = "Unix" use_small_heuristics = "Max" max_width = 100

by hydra-f

5/20/2026 at 11:21:26 AM

use "stage_fixed" to automatically persist the formatting :)

by ramon156

5/20/2026 at 11:57:28 AM

Thank you!

by hydra-f

5/21/2026 at 11:00:01 AM

I tried coding "ownership strictly" in Rust myself for a while and I've given up for the most part. In a lot of cases I painted myself into corners that I couldn't get out from without changing everything. That is probably a signal I should use, but Arc and clone also work. So I use them.

by luckystarr

5/20/2026 at 11:06:06 AM

The irony of the machines having no mechanical sympathy is just too good

by vermilingua

5/20/2026 at 10:55:42 AM

The feedback loop is the interesting part, if you use standard software engineering practices (modularise, test/document your interfaces, etc) then I find things like Claude Code do an exceptional job: since they can actually run cargo check/test themselves and can validate the tests too.

by dijit

5/20/2026 at 11:07:52 AM

What kinds of programs are you writing and with what models? I'm curious if the lifetimes your programs require are trickier than most.

by faitswulff

5/20/2026 at 11:15:26 AM

I'm actually vibe coding a game engine right now using a Hexagonal Architecture, and I ran into this exact same issue when trying to synchronize the feedback loop between the viewport and the editor. To be fair, I probably messed up the domain boundaries myself in the first place, but honestly, the AI-generated code wasn't very effective at solving it either

by jdw64

5/21/2026 at 10:16:12 AM

Game engines are one of the worst case scenarios for something like Rust as game heaps are simulations of the real world, so have lots of complex graph structures in them. Affine types work best for request/response oriented stuff where lifetimes are clearly bounded.

by mike_hearn

5/21/2026 at 11:33:49 AM

So, I am currently changing the language. Thank you for your kind advice

by jdw64

5/20/2026 at 11:34:39 AM

I'm surprised to hear this. I have not had any issues here at all. The AI might clone things but I don't really care/ mind, I can ask it to refactor to make things zero-copy after, which is how I've often written Rust myself. I've never seen it overly wrap things in Rc.

I've not done any particular/ special prompting.

by insanitybit

5/20/2026 at 12:21:30 PM

I see the complete opposite. The lower level the language, the less babysit the agent. Pure asm is the best, only with very advanced SIMD flags it has problems. C is excellent.

But python or typescript are full of errors all the time. I rather fallback to perl than python. Perl has been excellent all along.

by rurban

5/21/2026 at 9:22:21 AM

It makes sense. In a lower level language most mistakes would break the software hard and the AI will notice it. Python and JS are happy to run with lots of bugs.

by DeathArrow

5/21/2026 at 11:09:04 AM

Why would one not use TDD when coding with Claude? It's so cheap!

Yes you need to help it with the types of tests: you still need to know what you want from it. But once you have all types of tests (unit, db, fake-services, e2e, etc.) in place and documented; it can basically write tests until you cov-tool says it's 85%. Then you can ask it to find the weakest tests: you review those and make sure they are not weak, or Claude understands why they are not weak. Then let it find the next batch of weakest tests. Etc.

TDD finally makes sense economically for me on the types of projects I usually work on.

by flossly

5/21/2026 at 11:18:13 AM

But I use TDD when coding with AI.

by DeathArrow

5/20/2026 at 11:19:49 AM

I’ve been writing almost exclusively Rust with LLMs and rarely ever hit this. I guess maybe the kind of work you are doing?

by mountainriver

5/20/2026 at 11:43:06 AM

> whenever I use AI for Rust, it makes an insane amount of lifetime errors.

What model are you using, and what frameworks are you using?

This is not a hard problem for LLMs to solve.

Rust is nearly the perfect language for LLMs.

It's exceptionally expressive, and it forbids entirely the most common globally complex bugs that LLMs simply do not (and won't for some time) have the context window size to properly reason about.

Dynamically typed languages are a disaster for LLMs because they allow global complexity WRT to implicit type contracts (that they do not and cannot be relied on to withhold).

If you're going to add types, as someone pointed out earlier, why are you even telling an LLM to write Python anyways?

Rust is barely harder to read than Python with types. It's highly expressive.

You have the `&mut` which seems alien, verbose (safe) concurrency, and lifetimes - which - if you're vibe coding... you don't really need to understand that thoroughly.

You want an LLM to write code in a language where "if it complies, it works" - because... let me tell you, if you vibe code in a language where errors are caught at runtime instead of compile time... It will definitely NOT work.

by onlyrealcuzzo

5/20/2026 at 11:50:16 AM

It's not nearly the perfect language for LLMs and Rust is dramatically harder to read and reason about than Python with types. Other options work better for nearly all apps. I found Kotlin works well:

- Garbage collected so no reasoning tokens or dev cycles are wasted on manual memory management. You say if you're vibe coding you can ignore lifetimes, but in response to a post that says AI can't do a good job and constantly uses escape hatches that lose the benefits of Rust (and can easily make it worse, copying data all over the place is terrible for performance).

- Very fast iteration speed due to JIT, a fast compiler and ability to use precompiled libraries. Rust is slow to compile.

- High level code that reads nearly like English.

- Semantically compatible with Java and Java libs, so lots of code in the training set.

- Unit tests are in separate files from sources. Rust intermixes them, bloating the context window with tests that may not be relevant to the current task.

by mike_hearn

5/20/2026 at 12:05:23 PM

Then your domain problem you’re trying to solve doesn’t benefit from Rust.

Sounds like your work doesn’t need Rust and that’s ok.

But don’t generalize.

by rirze

5/21/2026 at 6:08:32 AM

Unless you're writing kernels, or high integrity systems with memory allocation constrains there is very little reason to use a language like Rust.

Everything that people find great on Rust with exception of the borrow checker, can be found in any compiled language from ML linage. And even that is fading away as they introduce a mix of linear types, dependent types, effects and formal logic.

by pjmlp

5/21/2026 at 4:20:08 PM

Rust has uses you haven't mentioned. For example, each compiled language from the ML lineage (and almost every language period) has its own runtime, particularly, its own assumptions and contracts about heap-allocated storage. In contrast, Rust famously does not need a runtime library -- and it is low-level enough that it can usually (with enough cognitive effort by programmers) interface with an arbitrary foreign runtime library -- so when the goal of the code to be written is to interface with code in a compiled language from the ML lineage, you usually have only 2 memory-safe options, namely, the same language as the code-to-be-interfaced-with and Rust.

The situation is a little more complicated than what I just wrote because two programs written in different ML-style languages could communicate via inter-process communication. But I don't see that. (Maybe my experience is not broad enough?) What I see is, e.g., Python libraries written in C and C++ (and Fortran, which is also not memory-safe) for performance reasons where the only memory-safe language that could have been used instead is Rust.

by hollerith

5/20/2026 at 11:52:10 AM

Write a 250k LOC compiler in Python and then get back to me how well LLMs write in Python...

Sure if you want to vibe code a TODO app where it's literally just copying and pasting one it's already seen 10,000 times before, it can do it in Python.

by onlyrealcuzzo

5/20/2026 at 1:16:27 PM

I didn't say it's better to use Python!

by mike_hearn

5/20/2026 at 3:21:38 PM

This is likely a fully LLM-generated reply.

by arkadiytehgraet

5/20/2026 at 1:58:24 PM

> Honestly, despite all the hype around Rust in the community, the fact that AI can't handle lifetimes reliably makes me reluctant to use it. The AI constantly defaults to spamming .clone() or wrapping things in Rc, completely butchering idiomatic Rust and making the output a pain to work with.

This hasn't been true since around gpt-4.5 on the OpenAI side of things. The 5.x models have been pretty much solid on Rust for a while now.

by joshka

5/20/2026 at 11:19:27 AM

I wrote and maintain this library of skills and workflows called Rust Bucket[0]

It sets up your repo to ensure agents use a workflow which breaks your user requests down into separate beads, works on them serially, runs a judge agent after every bead is complete to apply code quality rules, and also strict static checks of your code. It's really helpful in extracting long, high-quality turns from the agent. It's what we used to build Offload[1].

0: https://github.com/imbue-ai/rust-bucket : A rusty bucket to carry your slop ;)

1: https://github.com/imbue-ai/offload

by nvader

5/20/2026 at 11:46:22 AM

rust-bucket is 404, did you make it private?

by arpinum

5/20/2026 at 12:14:00 PM

Thanks for the flag--I guess I must have never made it public.

Fixed.

by nvader

5/20/2026 at 11:31:17 AM

Honestly Rust is an UGLY language. For whatever powers it possesses in memory safety, its cryptic symbology is reminiscent of assembly.

This is a problem when language designers are mathematicians and don’t understand typographical nuance and visual weights.

by boitiga

5/20/2026 at 11:33:59 AM

If I was forced to write it myself, then I'd agree, I'd use Clojure all day before Rust, because it's such a chore to write, edit and read.

The whole "with AI" kind of reduces my hate for Rust though, and increases the appreciation for how strict the language is, especially when the agents themselves does the whole "do change > see error/warning > adjust code > re-check > repeat" loop themselves, which seems to work better the more strict the language is, as far as I can tell.

The "helpful" error messages from Rust can be a bit deceiving though, as the agents first instinct seems to be to always try what the error message recommends, but sometimes the error is just a symptom of a deeper issue, not the actual root issue.

by embedding-shape

5/20/2026 at 11:41:28 AM

It’s funny I got downvoted immediately as expected.

I mean God help us should a crustacean try to understand the merits of my claim.

“Oh he’s saying something negative about rust…” Downvote!

I think with AI the language should still be readable. Humans need to be able to understand what’s going on!

by boitiga

5/20/2026 at 12:24:32 PM

Hardly surprising, you give a strong opinion but you don't actually back that up by any arguments, that stuff tends to be downvoted here. Add some proper reasoning and making it clear why you think as you think, and people will stop downvoting :) Also, stop caring about magic internet points, they don't matter and people downvote random stuff sometimes.

by embedding-shape

5/20/2026 at 12:33:04 PM

You’re right on both counts.

However, if I link to gestalt theory of psychology; The Elements of Typographical Style by Robert Bringhurst; and The Primer of Visual literacy by Donis Dondis, folks will undoubtedly NOT read it and still downvote because they have been in Rust code and so have naturally become accustomed to its monstrous appearance. :)

Perhaps I should design a language that is typographically sound—something like brainf*ck haha

by boitiga

5/20/2026 at 11:47:41 AM

If I was forced to write it myself, i would love to keep writing ruby. What a wonderful language. I dont write ruby anymore, mostly using golang and python.. but ruby still a joy.

by pelasaco

5/21/2026 at 11:12:02 AM

Have you looked at Kotlin? I looooved Ruby. But now I've come to a conclusion that a stronger type system is worth it: in teams, for LLMs, for my own sanity.

Kotlin is basically a Ruby (OO first with lots of FP goodness) with a serious type system. And where Ruby uses C-written libs in some places, with Kotlin one uses Java written libs from time to time.

See http4k for a nice implementation of Rack + a lot of goodness from Rails, without becoming a framework (it's just a lib).

by flossly

5/21/2026 at 1:50:53 PM

> Have you looked at Kotlin?

Yes Kotlin is nice too. Type systems are important and helpful. Performance is a must too.. that's why we all in some point left ruby... but ruby makes you happy.. Maybe because my experience with Kotlin is restricted to Android, i didnt love that that much. Same with Crystal or even JRuby.. it's almost ruby, but not really.

by pelasaco

5/20/2026 at 1:31:14 PM

I really don't get the complain about Rust's syntax, it's almost identical to TypeScript's and nobody complains about TypeScript Synthax being ugly …

(Yes, I know the 'a lifetimes are a bit weird, and that's not something that exist in typescript, but that's also not something you use everyday in Rust either.

by stymaar

5/20/2026 at 3:18:29 PM

[dead]

by jdw64

5/21/2026 at 2:42:53 PM

What aspects of Rust do you find ugly? Can you provide three examples?

by dmm

5/20/2026 at 3:21:15 PM

Why would the language being typographically ugly matter? Python's pretty, but it hides a lot of functional nuance behind that. Rust is terse, but it's also expressive in its terseness.

If you want to give it a fair shot, it does take some time to get used to, coming from something like Python or Ruby. I won't deny that. I've found that using LSP-assissted semantic syntax highlighting helps, for me, on the typographic front.

I don't think typographic design is a key consideration in most languages' designs, though, and I don't think it should be. The main thing I look for is consistent, relatively predictable rules around the syntax, as far as that layer of language choice goes.

by _verandaguy

5/21/2026 at 6:09:57 AM

I would rather have people running for languages like D, but the automatic resource management stigma is high among certain classes of developers.

by pjmlp

5/20/2026 at 12:23:53 PM

To me it looks clean and concise

by peter-m80

5/20/2026 at 12:34:36 PM

I’m curious why? Also I’m curious how long you have programmed in Rust?

by boitiga

5/20/2026 at 9:55:15 PM

[dead]

by sureglymop

5/20/2026 at 1:16:44 PM

Yeah, LLMs suck at named lifetimes. The number of times I have seen Claude reach for indices and clones instead of just using proper named lifetimes is too many to count at this point. Not great for high-performance code!

by mbbutler

5/20/2026 at 11:17:37 AM

I think it's due to the lack of quality instructions on what is good Rust code; AI often literally doesn't know what idiomatic Rust is. It can be good to have a reference where you write the basic rules that you want it to follow (ideal to assume it has no idea why spamming clone is bad and you're speaking to someone who has just watched one of those youtube videos with a dude in black t-shirt speaking very slowly and going over basic programming concepts as if they're breaking you out of the matrix).

by altmanaltman

5/20/2026 at 11:34:11 AM

Lots and lots of guardrails to not allow slop.

In tsz I have hard gates that disallow doing work in the wrong crate etc.

https://github.com/mohsen1/tsz

by mohsen1

5/20/2026 at 11:36:08 AM

> have hard gates that disallow doing work in the wrong crate

Maybe I'm using agents wrong, but I'm not sure how you'd end up in that situation in the first place? When I start codex, codex literally only has access to the directory I'm launching it, with no way to navigate, read or edit stuff elsewhere on my disk, as it's wrapped in isolation with copied files into it, with no sync between the host.

Hearing that others seemingly let agents have access to their full computer, I feel like I'm vastly out of date about how development happens nowadays, especially when malware and virus lurks around all the package registries.

by embedding-shape

5/20/2026 at 11:46:06 AM

tsz is an experiment in giving coding agents full control. On my day job I am a lot more careful. But I've moved on from manually approving every change and instead review the final diff. I noticed manually approving was counterproductive.

by mohsen1

5/20/2026 at 12:22:51 PM

Right, I'm giving my agents full control too, but not sure why that'd exclude putting them in a sandbox?

by embedding-shape

5/20/2026 at 11:20:43 AM

Clone is not "butchering idiomatic Rust", we gotta stop this nonsense

by ramon156

5/21/2026 at 6:11:38 AM

Well, some of us rather use languages with automatic resource management in the type system, in whatever form, and thus avoid typing .clone() all over the place.

by pjmlp

5/20/2026 at 11:26:52 AM

Sorry, should clarify. .clone() itself isn't inherently unidiomatic when used .

My issue is specifically with how the AI uses it. In AI code, .clone() is almost always used as a brute-force escape hatch

by jdw64

5/20/2026 at 11:37:30 AM

Just like for me as an amateur Rust enjoyer then

by izietto

5/20/2026 at 11:35:03 AM

So .clone() significantly reduces the mental overhead of using rust with a small performance impact? I'm intrigued :)

Maybe it's harder to reason about the lifetime semantics while also writing code, and works better as a second phase (the de-cloning).

by andai

5/20/2026 at 3:17:08 PM

    > So .clone() significantly reduces the mental overhead of using rust with a small performance impact? I'm intrigued :)

No, the performance impact will depend on `impl Clone` for the underlying type, the hotness of the code path, and how sensitive to those two variables your code's domain is. It may be extremely expensive.

    > Maybe it's harder to reason about the lifetime semantics while also writing code, and works better as a second phase (the de-cloning).

There are cases where assuming `clone` is possible allows for significant architectural and API simplifications at the expense of performance. In those cases, de-cloning will be involved and may produce significant changes.

by _verandaguy

5/20/2026 at 11:13:17 AM

How many of those tests have you actually read yourself if all of them are generated by AI (also when you're sleeping) ?

This is from 2025 - I would like to see an update now how that system turned out to be after the vibe hype

by icemanx

5/20/2026 at 11:56:29 AM

I feel like there's very little blogs that actually follow up on their experiment. It's just dopamine city.

by ramon156

5/20/2026 at 12:26:27 PM

To me, the real question after reading this, is: Is your new implementation of Azure’s RSL now being used?

If it is, and it works well, then to me this is far more meaningful than the fact that AI wrote 130K lines of code.

by misja111

5/20/2026 at 1:08:14 PM

Contrarian view: Why English will never be a programming language. https://www.slater.dev/2026/05/why-english-will-never-be-a-p...

by sltr

5/20/2026 at 5:18:51 PM

I am having a different experience than a lot of other commenters here vibe coding with Rust. I am not a Rust programmer or evangelist. I have implemented a drop-in Bash replacement/clone in Rust that passes the upstream Bash test suite and a whole battery of its own. It is a tiny bit faster than Bash itself but consumes a bit more memory. But Codex and Claude both did a great job with it.

I also had it implement a wasm geodesic calculator in Rust and it's amazing and in my use case is better than geodesiclib using the same updated algorithm.

I'm a "C-nile" Rust folks love to hate and did my first hacking in C Deep Blue C on Atari 8-bits. But I'm very impressed with these products and with the ability to leverage some features of Rust with them. (e.g. audit every unsafe instance and define its invariants, etc.)

I also agree with the commenter who said these LLMs are today, at the present moment, good at Go. The only language I notice it seems to be really good above and beyond others at is javascript, I assume because there's so much of it.

by jsLavaGoat

5/20/2026 at 10:27:30 AM

It's almost guaranteed with agents you could do the same job with less than half of 100k lines. I don't know whats impressive in lines of code generated by agent.

by staszewski

5/20/2026 at 10:30:50 AM

It just an anchor. If it were 50k would you say the same down to 25k? And if so how many more times would it apply?

The interesting thing is that it was manageable solo (in many ways it's _more_ manageable solo+AIs than with coworkers+(their)AIs), and in such a short amount of time.

by ndr

5/20/2026 at 12:09:44 PM

Original RSL library is 36k LoC. And this is C++. Rust should be like 50% smaller, that is, 18k LoC. This library is so big that I bet the author has no idea if it works or not. 1300 test generated by AI say nothing about actual quality.

In the end it is just a lot of unmaintainable code quickly generated by AI.

by kikimora

5/20/2026 at 1:24:04 PM

This is uncharitable, but makes a prediction. I imagine you'd bet the author won't be successfully using this, at MS/Uber or wherever they are, in a year time?

Rust makes no promise of being terser than C++, and RSL does less than this considering the optimization.

Also it's only 45/50k LOC so not so very from the 36k LOC.

by ndr

5/21/2026 at 8:18:00 AM

Yes, I would bet it won't go anywhere.

The blog post mentioned the project is 130k LoC multiple times. Where 45/50k LoC comes from?

>Rust makes no promise of being terser than C++

True, but Rust has no header files, this alone is a great LoC saver.

by kikimora

5/21/2026 at 8:28:54 AM

50k LOC wouled be the rust code without tests.

But it's not apples to apples because they seem to have done much more performance work though, this is far from code golfing.

by ndr

5/21/2026 at 11:04:21 AM

RSL’s 36k LoC includes tests and should be compared with 130k LoC, not 50.

Having 90k LoC of tests for 50k LoC codebase also a problem. At least in my experience LLM generate too many tests. It does not evolve test suite but throws more code into it as development happens. Unless I aggressively refactor tests I quickly end up with a test suite that I don’t understand. Then LLM modifies tests to “make code work” and I have no idea if this is a legit edit or LLM cheats. I wonder if the same thing is happening or about to happen with this codebase.

by kikimora

5/20/2026 at 1:35:50 PM

Has Rust code generally been found shorter than C++ in practice? I don't see an obvious reason for it.

by zahlman

5/21/2026 at 6:13:24 AM

I see no reason for Rust to be shorter than C++,. when using latest standards.

by pjmlp

5/20/2026 at 11:23:08 AM

the interesting thing is how fast it becomes unmanagable.

by rimliu

5/20/2026 at 11:34:58 AM

Also that, I suspect that's correlated to how practical is to have multiple people (with their agents) iterating on it.

by ndr

5/20/2026 at 10:57:56 AM

> It's almost guaranteed with agents you could do the same job with less than half of 100k lines.

That's great, non-test code is only ~47k lines of code.

by ashirviskas

5/20/2026 at 11:08:54 AM

For a startup with limited funding, building a product is no more a bottleneck. every one doesn't have the same access to funding!

by sreekanth850

5/21/2026 at 5:39:47 PM

with Rust the failure mode isnt wrong code, it's unidiomatic code. .clone() everywhere will compile fine but you'll feel it later

by cold_harbor

5/20/2026 at 5:55:33 PM

The thing that impresses me most is that the author knows everything (from the high level architecture to the small details) of "multi-Paxos consensus engine" (I have no idea what it is, but it must be very complicated) and can write everything out for AI to read (or did he/she use an app to convert speech to text)?

by dxxvi

5/21/2026 at 6:10:51 AM

Cool post. I don’t fully understand what a code contract is but appreciate the advice. I have settled on a similarly light weight /agile folder when I keep my roadmap.md with epics and sprints.

by digitaltrees

5/21/2026 at 6:53:07 AM

It's a set of asserts that are a part of the type signature. Requires are asserts on the inputs, ensures are asserts on the outputs.

Depending on your backend you either ignore them, check them all of the time, some of the time, or have SMT-solvers prove that if you uphold the first one all else must follow.

If you're interested in the last one, have a look at Dafny[0]

[0] https://dafny.org/

by 3836293648

5/20/2026 at 12:05:35 PM

This is great example of AI slop and a big problem with AI coding.

Original RSL library has 36 KLoC across C++ source and headers files. Rust supposed to be more expressive and concise. Yet, AI generated 130k LoCs. I guess nobody understands how this code works and nobody can tell if it actually works.

by kikimora

5/20/2026 at 1:13:12 PM

All unit tests can pass if you don't assert anything. Just have to make sure to read through all 130k lines of code to check.

by jmpeax

5/20/2026 at 11:32:57 AM

Paxos is certainly non-trivial in the sense that tiny changes can break it, but in terms of functionality it is not that big. 50 KLOC just seems like a lot of code to me.

by danbruc

5/20/2026 at 7:53:35 PM

The moment a language is the output of a natural language compiler, the language itself is kind of irrelevant.

Change the skills, ask the agent to do exactly the same in something else.

I am slowly focusing on agent orchestration tools, which make the actual programming language as relevant as doing SOA with BPEL.

by pjmlp

5/20/2026 at 7:55:20 PM

The language may be irrelevant, but the hard guarantees it offers are not. Agents are still very stochastic, they need something deterministic constraining their output.

by throw-the-towel

5/20/2026 at 8:16:33 PM

That is where formalisms come into play.

Also it is kind of interesting that there is so much enthusiasm to use Claude and Claw all over the place, yet lack of vision on how much the whole infrastructure will improve.

Even when it finally bursts and we get into another AI Winter, what was already achieved isn't going away.

by pjmlp

5/20/2026 at 4:49:50 PM

I've found Rust's safety guarantees to be less useful for slop-generated code because LLMs can always fight their way through the borrow checker by spamming enough Arc<Mutex<Arc<Mutex<...>>>> and clone() everywhere. Rust only gives you safety properties, not liveness. Interior mutability is a fantastic tool for turning safety failures into liveness failures. Remember kids: deadlock is a safe outcome.

It works for humans because when we get a borrow-check failure, we take a step back and think about the global shape of our code and ownership. LLMs path straight to the goal. Problem: code doesn't compile. Solution: more clone()

by wren6991

5/20/2026 at 11:38:31 AM

Lessons. There's no such thing as learnings.

by 10g1k

5/20/2026 at 11:50:07 AM

Learnings is irritating to me. The way kids use the word aesthetic is irritating too. I wonder if I might be that old man shaking his fist at the clouds, but I have gotten over begs the question, and literally, so maybe not yet...

by criddell

5/20/2026 at 1:37:48 PM

I understand the instinct, but that's a bit too prescriptive for me.

https://en.wiktionary.org/wiki/learnings

by zahlman

5/20/2026 at 11:55:41 AM

A lesson would be a specific learning activity happening at a specific place and time, administered by a person more knowledgeable than you; like a teacher or mentor "giving a lesson".

If you're fine with the generalized form "learned a lesson", then surely "learnings" is fine too. There's no point in trying to police a completely normal and sensible use of language.

by tskj

5/20/2026 at 12:48:41 PM

So when you cause an incident because you did not pay attention and "learn your lesson" who's the mentor?

by esafak

5/20/2026 at 4:05:50 PM

The universe.

Anyway, I accept this usage of the word "lesson", so I also accept "learnings". My point was one of hypocrisy, not policing people in how they can use the word "lesson".

by tskj

5/20/2026 at 10:32:41 AM

Is the idea of the runtime contracts similar to the idea of runtime validation? Or are they different in some way?

by nilirl

5/20/2026 at 10:47:14 AM

It is described in the "Code Contracts" section of the article: "Code contracts specify preconditions, postconditions, and invariants for critical functions. These contracts are converted into runtime asserts during testing but can be disabled in production builds for performance". The .NET framework article that he links to: https://learn.microsoft.com/en-us/dotnet/framework/debug-tra...

by pramodbiligiri

5/20/2026 at 11:37:25 AM

Is this basically what Dijkstra was saying? I've been thinking how his approach was considered impractical, but may eventually become necessary for security/stability reasons the way things are going. (Seems like new zeroday on HN front page every day now.)

by andai

5/20/2026 at 10:53:31 AM

Ah, I missed the reference. Thanks a lot!

by nilirl

5/20/2026 at 1:30:46 PM

I have Tarpaulin code coverage check and everytime that it drops below the treshold Claude gives up quickly and just lowers the threshold. I don't know how to overcome it. CLAUDE.md neither AGENTS.md help but the LLM always finds its way.

by bio-s

5/20/2026 at 3:42:20 PM

[flagged]

by galaSerge

5/20/2026 at 11:56:23 AM

How are you keeping the requirement, design, and tasks docs in sync as the code evolves? I'm curious if anyone's landed on a good workflow for this.

by chemex

5/20/2026 at 12:07:43 PM

[dead]

by uptodatenews

5/20/2026 at 10:54:49 AM

Rust code generation consumes lot of token

Go is much better target, i've observed rails/ruby code is also much easier for AI to spit out.

And Haskell flies with AI

by faangguyindia

5/20/2026 at 11:06:45 AM

Yes, but it comes with much better “built-in” guardrails to rein in the autocomplete. Especially if compared to something runtime-surprise-prone-if-lovable like Ruby.

by jgilias

5/20/2026 at 1:12:27 PM

This is why I suggested Go.

Rust doesn't add anything over Go for LLM coding.

by faangguyindia

5/21/2026 at 4:45:25 AM

But it does. A whole class of runtime errors you can trivially produce in safe Go — null pointer dereferences, unchecked type assertions, missed enum cases — are unrepresentable in safe Rust. Also, the type system is a lot more expressive, so more invariants can be encoded in it, leading to more business logic bugs being caught at compile time rather than in production.

by jgilias

5/21/2026 at 6:15:41 AM

For that I would chose neither, rather go with Haskell, OCaml, F#, Lean, Dafny, FStar, Scala, Kotlin,...

by pjmlp

5/21/2026 at 8:40:47 AM

We MUST get programming languages and LLMs that do not ever change or break comments.

You can’t have contracts defined in comments in code because there’s no guarantee they won’t be deleted or changed.

Even better, we need the ability to embed directives to LLMs which are NOT comments, but a type of programming construct specifically for this purpose.

by andrewstuart

5/20/2026 at 11:22:27 AM

Rust is about abstractions more than code. You can ask AI to "Optimize/Test/Clarify" but at the end of the day you should be willing to blindly agree to it's output or spend more time reviewing someone else's code.

by bharxhav

5/20/2026 at 2:02:06 PM

Where can we read the code?

by valcron1000

5/21/2026 at 6:03:59 AM

[flagged]

by max_unbearable

5/20/2026 at 3:41:04 PM

[flagged]

by galaSerge

5/21/2026 at 7:20:03 AM

[flagged]

by JulianSmith1

5/21/2026 at 9:35:00 AM

[flagged]

by lol-lol-lol-2

5/20/2026 at 4:03:57 PM

[dead]

by gertlabs

5/20/2026 at 8:16:41 PM

[flagged]

by implexa_founder

5/21/2026 at 8:23:06 AM

Seriously, "Learnings"? Learn you an English.

by soloto