5/22/2026 at 7:51:56 PM
I keep wondering how people accept a nights worth of agent activity.I feel 30 minutes of planning and 30 minutes of implementation in my solo side project's repo is too big to review. At minute 5, I may ask the AI to redo stuff even as its spitting out code.
by aitchnyu
5/22/2026 at 8:19:10 PM
Most of the narrative is about how AI is writing all/most code, but I’d wager that the fraction of human reviewed code is approaching zero far faster than anyone is realizing or willing to admit.by d4rkp4ttern
5/22/2026 at 8:44:08 PM
Very true. Last year I at least glanced at every line of AI generated code. Now if some AI makes a 10k line program for some one-off tasks, I run the program, glance only over the output, and move on.by londons_explore
5/23/2026 at 3:35:07 AM
Especially if you're having an LLM write non-interactive scripts to calculate complex things from large datasets, glancing at the output is not enough to know if the output is remotely accurate (unless the output is so trivial you could literally do it in your head).Case in point: I recently asked an LLM to write a pile of code to compile historical baseball stats to test betting success against the results of my hand-written code that evolves genetic algorithms. I marveled for a little while at the unbelievable improvement in EV/ROI that this script was showing could have been achieved from certain small tweaks. I only noticed after pushing a total bet that the push registered on the output as a win - and only because I was carefully staying on top of it. A single stupid recursively operating >= instead of > had caused completely nonsensical results that looked plausible.
Imagine, like, trusting a 10k loc script to give you data for something you were going to build in the physical world, and hoping an LLM hadn't made a mistake like that.
by noduerme
5/23/2026 at 9:28:23 AM
Code needs tested. I'm glad that the bar of entry has been lowered but now we just have a huge amount of people that haven't yet learned anything about how to test and verify that the code meets the expected requirements.by bobjordan
5/23/2026 at 10:13:21 AM
AI codes, AI tests, AI verifies, in a Ralph Loop ( https://github.com/snarktank/ralph ) :)by bdangubic
5/22/2026 at 8:49:04 PM
Which one-off tasks need 10k lines of code?by MaKey
5/22/2026 at 9:47:47 PM
Would depend on what AI and prompt you use ultimately. Ask it to add tests (functional, E2E and unit, maybe invent a new type too), packaging, modular code and/or whatever, and you get to 10K relatively quickly with some of the more verbose LLMs out there.Personally it's probably the biggest struggle, trying to rein in the "spray and pray" approach LLMs typically like to take, and reducing the "patch on top of patch" syndrome too.
by embedding-shape
5/22/2026 at 9:06:19 PM
Calculate the engine power of a 2015 VW polo when travelling 70 mph on a flat road behind a box truck. Draw a chart of drag Vs follow distance. How significant is humidity on the result?by londons_explore
5/22/2026 at 10:45:18 PM
European or African Polo?by Grosvenor
5/23/2026 at 3:10:58 AM
You're not supposed to post that you just like a comment, but this was best comment on HN in ages.by noduerme
5/23/2026 at 4:21:12 AM
You can't beat this though:by Grosvenor
5/22/2026 at 10:54:47 PM
I don't know th- AAAAHHHHby metrix
5/22/2026 at 11:34:55 PM
One off web app for scrubbing through some data, that, once done, will never be run again?by hgoel
5/22/2026 at 9:31:40 PM
Java programsby bossyTeacher
5/22/2026 at 11:17:54 PM
Enterprise programs*by rq1
5/23/2026 at 1:45:42 PM
This is fine for one off tools and I do the same. But building long-lived "professional grade" production software this fails real quickly.My team is using AI for most of the code, but the human review layer is crucial and unavoidable if you're interested in things like reliability, uptime, controlled feature rollouts, the integrity if your user's data, etc.
by rco8786
5/23/2026 at 11:51:46 AM
A huge factor I don’t see mentioned often enough, is the rapid increase of AI-coding in a language unknown to the dev.by d4rkp4ttern
5/24/2026 at 2:23:55 AM
[dead]by JJOKOCHAA
5/23/2026 at 3:49:57 PM
Pretty much. For my home IT projects I have been playing around with various means of implementing agents.I’ve looked at the outputs here and there - and holy hell would it never pass review if I were trying to make something robust and anti-fragile. But since I can just have AI spit out a fix for the horrific “code” when it breaks in a totally predictable manner it’s just not worth my time to try to actually sit down and get it done right. Or even fight with AI by providing a good specification and design guidelines.
I imagine this is how things are going in the real world, given 30 years of working with various levels of humans. So long as the output is “good enough” it is the extreme minority of folks who care about much else. And that’s for mid-level to senior folks who have the experience to know better. Juniors wouldn’t even be able to pick out most of even the most obvious anti-patterns AI tends to spit out such as putting configuration within code, etc.
Refactoring is just in a new world too, that us olds probably have a hard time with. It’s no longer examine the code, identify design gaps, find high leverage places to start fixing, etc. It’s now “this is broken, rewrite from scratch” when it eventually turns into too much spaghetti.
In some ways being entirely focused on the outcomes is freeing in a way. But man under the hood is crazy and a whole new world.
by phil21
5/23/2026 at 2:58:08 AM
i admit. agentic coders do not look at the code except by accident. not much point unless you're working on enterprise applicationsby jatora
5/22/2026 at 10:43:46 PM
People already barely reviewed code, most of it was imported libraries.by vasco
5/22/2026 at 10:47:53 PM
The assumption used to be that you respected the library enough and believed it was well reviewed and architected by the maintainer(s). But now even that's unreliable because libraries are being slopified at an unreviewable pace too.by seanw444
5/23/2026 at 12:57:53 PM
> The assumption used to be that you respected the library enough and believed it was well reviewed and architected by the maintainerI don't know many serious software engineers who'd take that approach, the convention was always to actually open up the code, evaluate the quality, see if they seem to know what they're doing, then chose the libraries you know works and could be adjusted to fit whatever you wanted it. At least for professional development inside companies, not a single library would be included unless you at least reviewed that the top-level dependency you pull in actually had code worth pulling in in the first place.
And this approach just as well today as it used to, you literally have to spend like 3-5 minutes browsing the code, evaluate the abstractions they've built and then say "Yes, looks good enough to try to use" or "Clearly these people just hacked this together as fast as they could".
by embedding-shape
5/23/2026 at 3:04:42 AM
It's weird that you think humans weren't slopifying code until LLM's came along. At least now they are implementing tests and CI and far more documentation, updating API versions, etc. OOMs above the amount they did before.I'd also wager that far more % of code gets more coverage of review, via prompting AI to do it, than it did before.
Most PR's pass as long as they A. pass checks, B. dont introduce regressions, C. fix a bug or implement a feature. People talk about this era of humans reviewing code with nostalgia... but that never existed at scale.
by jatora
5/23/2026 at 11:26:23 AM
> The assumption used to be that you respected the library enough and believed it was well reviewed and architected by the maintainer(s).Let us be honest, for your average dev, the assumption was that the number of github stars, npm/nuget downloads was a god proxy for quality.
by bossyTeacher
5/22/2026 at 11:14:28 PM
People seem to have rosy glasses about how great and vetted code was before AI coding took off the way it has, it was not great.by giancarlostoro
5/23/2026 at 12:00:41 AM
I’d say the increased scrutiny has merely exposed the difference in care between the different groups in the industry. Seems to explain pretty well why both sides are equally confounded by the other’s expectations.by king_geedorah
5/23/2026 at 1:48:27 PM
Which people? I’ve never worked at a place where reviews weren’t taken seriously. For small changes a cursory glance, sure, but anything medium-sized meant checkout+local test. If anything we’d spend too much time on code reviews or pair programming?by port11
5/23/2026 at 12:06:52 AM
People keep saying this like it’s some meaningful point, but the reality is many people in different projects have a shared need for that code to work correctly, and there is a social proof involved in used open source libraries. That is why people look at downloads and dependent projects as heuristics of stability and correctness. That is not the case with (and cannot be obtained with) code authored by generative AI.by almostdeadguy
5/23/2026 at 5:04:49 AM
Yes it can, the code will be ran and you will have the proof that it ran well. Or it won't run well and you'll re-do it. Same as with some imported library.by vasco
5/22/2026 at 7:59:16 PM
A lot of that agent activity is combing over what was previously made, forcing constraints upon it so you have a reasonable expectation of what ends up on your desk for review.For me, strong file structure helps as well. Reviewing a 3,000 line file it just created is abysmal. I wouldn't accept that from human nor machine :) Multiple files in the right places helps reduce cognitive load.
Sometimes I'll also review with the agent interactively. What is the most important file to review first, etc?
I like to stage changes into a "LGTM" pile. Then if I want changes, I'll have the agent "review unstaged changes - I want something different done here."
by lanyard-textile
5/22/2026 at 9:52:13 PM
No one is reviewing the code. Managers don't want us to review code either. It's a bottleneck. If something goes wrong (bugs) they are fixed as they come. It's a very sad era of software engineering. If there ever was some engineering in our trade, now it's mostly gone. We are guessing around, writing "skills" files with "please, do not introduce bugs" or "you are an owner, not a renter" or similar stuff. It's just very low effort, very undeterministic. Big apps out there are going down constantly because of AI slop (e.g., Github), and we are seeing it more often as well in non-so popular systems (e.g., in my company and other saas that we use).Product managers never cared about the code. Engineering managers don't care about code as much as they did when they were engineers. Directors couldn't care less about code. CTOs don't know what code looks like anymore. We are at the end of the chain, and somehow we always took pride of well written and maintainble code because we knew deep inside that good systems are built based on good code. But now we are jeopardizing ourselves, it's us the engineers who don't care anymore about code and with AI that problems is amplified.
by dakiol
5/24/2026 at 6:38:18 PM
Everything I can batch overnight locally is free.by gopher_space
5/22/2026 at 10:08:19 PM
I usually aim to have Claude end up with about 500 lines of code after a night of work. Most of what it's doing is experimenting with many different approaches, summarizing them, and then giving me a relatively small diff to review and modify.by SatvikBeri
5/22/2026 at 11:32:41 PM
This is the way to go. I usually play with relatively stable software where the improvements are either performance or very small niche features that are built on top of already existing ones. Big changes are undesirable by both the others working on it and its users.by a1o
5/22/2026 at 7:54:44 PM
I wonder the same. The answer I usually get from people who do manage is that they don't look at the code – or at least not in detail.Personally, I always end up tweaking something the agent produced. I wonder if I should let go of that control...
by fphilipe
5/22/2026 at 8:04:07 PM
Even the newest models, like GPT 5.5, only deliver what I want nine out of ten times. If I didn't catch the remaining 10% of misguided garbage by manually reviewing every change, it would add up really quickly.by InsideOutSanta
5/22/2026 at 8:03:32 PM
yeahby debabrata_saha
5/22/2026 at 8:46:59 PM
I never look at code. It used to be that it quickly became unmaintainable spaghetti where the agent struggled to make any change at all, but in the past year (and with a three step plan/develop/review workflow), the quality is so good that I basically just don't look at the code any more.It definitely has fewer bugs than a senior developer, but it really hinges on getting the plan right. 20 minutes of planning and 20 of implementation sounds about right for my workflow as well, just make sure you have GPT as a reviewer. It's very nitpicky and finds lots of bugs.
by stavros
5/23/2026 at 11:53:24 PM
I'm starting to agree with you; I found the plan/develop/review workflow to work quite well, but I'm not at the point of not looking at the code at all yet.I guess you actually review and actively participate in making the plan, you just don't review the code afterwards?
Could you share some more details on the specifics of your workflow? (What models/harnesses? do you use the same or different context windows? How exactly do you run the review, and how do you pass along and act upon the information from the review?) Also, how big are the changes you usually implement with one plan/develop/review cycle?
by bogdanoff_2
5/24/2026 at 12:13:55 AM
Sure! Here: https://www.stavros.io/posts/how-i-write-software-with-llms/The changes aren't usually very big, basically what you'd put in one ticket. If I need to make large changes, I do them in self-contained stages, if that's possible, otherwise I will tell the LLM to add specific tests in the plan, and I will test thoroughly after.
by stavros
5/23/2026 at 4:15:39 PM
20 minutes planning, 20 minutes coding, 200 minutes review and refactor (includes going for a walk and thinking about the problem deeply).I know a lot of engineers who skip the last part. They're over confident in their original plan. They're over confident the agent actually fulfilled the plan.
by jappgar
5/23/2026 at 4:50:52 PM
You aren't treating this as a question of ROI. Is it worth spending 5x as much to make sure the plan was OK and implemented well? Or is it actually OK if we discover the bug during testing?The answer won't be the same for all software, but you're assuming it will be.
by stavros
5/22/2026 at 8:57:44 PM
This brings to mind two thoughts:First, that this is challenging to scale across large orgs. Even if your plans produce high quality code, that isn’t true for everyone. I’m definitely struggling with slop code being collectively mailed to me for review my our 1,000 engineers that were told to use their AI subscription all at once.
I feel like we should be taking “prompt engineering” more seriously. And when people mail me code to review, it should also include the agentic workflow and plan. So that when code isn’t up to quality, and can have a discussion about the prompts used to generate it.
My second thought is related to your senior engineer comment. This isn’t surprising, because in most engineering orgs, seniority is completely unrelated to code quality. In fact, many orgs incentive the opposite: “senior” devs that push out buggy code quickly and push accountability downhill to the junior devs.
by materielle
5/23/2026 at 4:18:32 PM
I'm so curious to see how other people prompt but literally no one I work with will share it. They might share plans, but they never show the conversation, which is the most crucial part.Judging by how they struggle to communicate generally, I can't imagine their prompts are doing much heavy lifting.
by jappgar
5/22/2026 at 9:00:33 PM
Eh, everything is challenging to scale across large orgs. Even before LLMs, the code was a huge ball of spaghetti that barely held together. Now we just get there faster.About senior engineers, I guess that depends on the org you have experience with. My experience doesn't match yours.
by stavros
5/23/2026 at 1:46:19 AM
You care about code quality. Many don’t. I had someone tell me this week that a 6000 line class was ok because it was easier for the model to understand and that’s more important than human comprehension. And I get his point but that seems like a big risk to take.by digitaltrees
5/23/2026 at 1:48:58 AM
and it's wrong. a 6000 line class is not easier for a model to understand. the same things that help humans also help agents. I find myself adding linters that must pass and the agent muss fix that limit file size, function length, function complexity, how many files in a directory. a little more work for the agent, but the codebase is healthier and the agents write fewer bugs.by mpalczewski
5/23/2026 at 1:51:59 AM
I don't think the same things that help humans help agents. Simplicity helps humans, for agents parsing complexity is a breeze.Not saying code quality isn't important - it is. But I think what is described as quality code will change.
by lukevdp
5/23/2026 at 2:08:18 AM
Agents still pay a penalty for complexity even if it is a smaller one.by cjbgkagh
5/23/2026 at 4:26:33 AM
Parsing single file is easier than navigating a file system for an LLM. Until the models have context windows large enough to hold the entire codebase in one shot, single files will beat multiple files every time.by digitaltrees
5/23/2026 at 11:33:33 AM
This. I suspect the codebases in the future will be made of a small number of gigantic source files. These will be able to be transpiled into a more human friendly that produces multiple smaller files per big file in human-debug mode.by bossyTeacher
5/23/2026 at 2:00:05 PM
As a human who typically uses large files, 10k to 30K lines of code files are pretty common, I find the agents don’t read the whole file after the first time, they almost always do a range select for the bit they are interested in.by cjbgkagh
5/22/2026 at 11:02:01 PM
Yeah the multi-agent workflow just hasn't been satisfying to me. The more chats I try to run at once, the more I got lost and overwhelmed. I trust Claude to implement a plan correctly after I've reviewed it, but if I don't review all of the plans, I will miss some small detail that it misunderstood and it'll be a pain to fix later.I'm like a 1-2 chats at a time kind of guy. I just don't see how I could keep my exact vision for the project otherwise.
by BosunoB
5/22/2026 at 11:40:52 PM
Same, on top of that multi-agent workflows just cost too much to make stopping and correcting them to feel worthwhile, compared to one or two manually managed chatsby hgoel
5/22/2026 at 8:35:20 PM
So I've been in a hobby project for a few weeks -- transforming an old software modem binary to c code.I gave it the existing modem, and had it build rigging to build test vectors. I had it specify the work in the modem. And to confirm that legacy<>legacy produced the same streams as the new code. I've also recorded test vectors vs. other modems.
I've since launched it on targeted refactoring and code reduction projects.
I am mostly not looking at the code. There's a 100KSLOC lump of code that is much cleaner than a decompilation but a fair bit dirtier than what I would write myself. It is not factored terribly. I have some hope of getting it to trim this down to 70KSLOC that then I can accept in small blocks.
It outperforms the original softmodem, hitting higher RX rates for the same line quality and using less CPU. It also has additional functionality.
So, you know, I would never have written something this large for a hobby myself. And it's cost me $200 and 20-30 minutes per day for a few weeks to get a huge functional surface that I do believe I will be able to trust at the end of the process.
by mlyle
5/22/2026 at 9:22:57 PM
I don't like that there are any good sounding stories, but this sounds pretty good.by Brian_K_White
5/22/2026 at 8:51:03 PM
That depends. When I'm working on a 1 in a million race condition in some multi-threaded code, the agent needs hours to figure out what is going on. (I would probably need weeks - I don't know as I've given up on some of these before I could point an agent at it)by bluGill
5/23/2026 at 7:00:41 AM
I never review anything writtend by codex in my pet projects. It works or it doesn't and then I prompt again. I can see how it's easy to multiply agents in this case.Now when using it for my job... that's a totally different story: I review all the changes, so a single chat session with an agent can lead to a whole day of review. And it's great, sometimes the agent uses patterns and functions I don't know, so I learn a lot.
by JodieBenitez
5/22/2026 at 11:39:09 PM
I agree, but for small tasks - <20 lines that I can understand in a minute or two - perfect. Thinking about it - I have hundreds, if not thousands of tasks that I would like to do, improving pipelines, migrating from one tool to another, but never have time. The only question is - if I don't have time to do it, do I have time to prompt it?by suralind
5/23/2026 at 2:42:01 AM
Cost of generation is low, why review? Regenerate if not working. Rinse and repeat.Maximize providers profits. What can go wrong.
by zx8080
5/23/2026 at 11:33:32 AM
the bottleneck moves from generation to review. agents parallelize, humans review sequentially — 8 parallel cards means 8x the diffs to read, none of the timelines overlapby cold_harbor
5/22/2026 at 10:07:01 PM
They don’t, they just have Claude commit and push straight to the main branch. Just like the author of this 100% slop app: https://github.com/leodavinci1/kanbots/commits/main/by Kwpolska
5/22/2026 at 7:59:41 PM
They most likely don’t review it ;)by throw03172019
5/24/2026 at 2:26:04 AM
[dead]by JJOKOCHAA
5/22/2026 at 11:02:32 PM
Those people don't review.by keyle
5/23/2026 at 12:38:45 PM
Without agentic coding the number go up narrative dies. There's your answer.by jgalt212
5/23/2026 at 10:05:36 AM
No one reads code that results from this. Those who say otherwise either lie or are very bad developers which is essentially the same as not reading that code.by risyachka
5/23/2026 at 6:16:59 AM
I understand and agree with the feeling but then I also feel AI is too slow and too expensive.My most successful autonomous runs have been expanding scrapers across a number of similar but different portals. I had examples and targets and it just kept searching for new ones and adding them.
But even doing basic ML auto research k have found it to be surprisingly poor except at trivial but useful augmentation of models. Yes it can implement things but somehow I am required a lot even though I set up a lot of framework around it.
My mental model is that it's very good at complex deterministic work like reading bad API docs and getting some connectors to work.
But perhaps I care less about being stuck in a local optimum there.
by nobodywillobsrv
5/23/2026 at 4:30:43 AM
Testingby toobulkeh
5/23/2026 at 1:49:18 AM
whenever i found a guy who uses parallel overnight agents, i asked them how many users they have. Crickets.They do not have any users. Meanwhile, i've to do code reviews and all otherwise my 12,000+ users will be pissed off if anything in their workflow breaks.
This means i really cannot release more than 1 tiny feature a day. And using parallel agents, well that's good for testing but i don't think i need to add that many features to add anything.
by faangguyindia
5/23/2026 at 12:30:23 AM
[flagged]by anhphong
5/22/2026 at 8:17:01 PM
Lots of people are working on repetitive simple projects like the Nth website whatever or things like that, boring stuff. This LLM era is already a very big deal for these people.Personally somehow I am working on stuff that has like 25% not trivial stuff and that is enough to have the same experience as you have.
But also lots of people just don't care about quality and they might be right with their customers/audience. In these cases when someone catches one, an agent is going to iterate on it and make it (seemingly) go away, bandage applied, who cares again. This has a market, I am sure. Lots of programmer folks are just as bad.
by szundi
5/22/2026 at 8:57:15 PM
Yes it is too big to review for you - the human - so you simply don't review code anymore. Isn't that difficult to comprehend, is it?by siva7