The human cost of 10x: How AI is physically breaking senior engineers

4/14/2026 at 12:38:59 AM

Can definitely attest to this. The frequency of outages at my company have increased drastically the past year, especially ever since incorporating agentic development. I’m seeing all of the dev best practices go out the window. We have a few vibe coders that are posting 15-30 PR’s per day. It’s way too much for us to review. We’re not a big shop. I think we’re going to have to hire more people just to review code across the industry. And those people will have to know how to actually write software otherwise what are they even reviewing. Maybe the models will get so good they never make a mistake. Doubt it.

by zthrowaway

4/14/2026 at 12:56:20 AM

The proposed industry solution is to use agents to review PRs, as not to slow down the velocity of delivery...

My current workplace is going through a major "realignment" exercise to replace as many testers with agents as humanely possible, which proved to be a challenge when the existing process is not well documented.

by PradeetPatel

4/14/2026 at 1:52:28 AM

The fact that anyone in leadership would ever think this is even remotely possible - given my experience in the general state of requirements / contracts / integrations / support - makes me bleed from my earholes just a little bit.

It's starting to just feel a little like an excuse to call everyone on deck for "a few weeks trying 9-9-6". But even then the lack of traction isn't between the eyeballs and the deployment. You'll still be spinning wheels in that slippery stuff between what a customer is thinking and what the iron they bought is doing.

by lopsotronic

4/14/2026 at 1:00:03 AM

So you essentially trust the output of the model from beginning to end? Curious to know what type of application you're building where you can safely do that.

Edit: to clarify, I know these models have gotten significantly better. The output is pretty incredible sometimes, but trusting it end to end like that just seems super risky still.

by ryan_n

4/14/2026 at 4:58:16 AM

I guarantee you it's nothing quantifiable.

LLMs can't be responsible for deciding what code you use because they have no skin in the game. They don't even have skin.

If you type fast, well then it takes just as long to code it yourself as review it. Plus you actually get flow time when you're coding.

For heaven's sake people have the robot write your unit tests and dashboards, not your production code. Otherwise delete yourself.

by jart

4/14/2026 at 1:06:36 AM

"Hey Claude, did Claude do a good job?"

by ricketycricket

4/14/2026 at 1:42:48 AM

I did an experiment today, where I had a new Claude agent review the work of a former Claude agent - both Opus 4.6 - on a large refactor on a 16k LOC project. I had it address all issues it found, then I cleared context, and repeated. Rinse and repeat. It took 4 iterations before it approached nitpicking. The fact that each agent found new, legitimate problems that the last one had missed was concerning to me. Why can’t it find all of them at once?

by sgarland

4/14/2026 at 4:28:36 AM

You're expecting it to be a person. It's not.

It is more like a wiggly search engine. You give it a (wiggly) query and a (wiggly) corpus, and it returns a (wiggly) output.

If you are looking for a wiggly sort of thing 'MAKE Y WITH NO BUGS' or 'THE BUGS IN Y', it can be kinda useful. But thinking of it as a person because it vaguely communicates like a person will get you into problems because it's not.

You can try to paper over it with some agent harness or whatever, but you are really making a slightly more complex wiggly query that handles some of the deficiency space of the more basic wiggly query: "MAKE Y WITH NO ISSUES -> FIND ISSUES -> FIX ISSUE Z IN Y -> ...".

OK well what is an issue? _You_ are a person (presumably) and can judge whether something is a bug or a nitpick or _something you care about_ or not. Ultimately, this is the grounding that the LLM lacks and you do not. You have an idea about what you care about. What you care about has to be part of the wiggly query, or the wiggly search engine will not return the wiggly output you are looking for.

You cannot phrase a wiggly query referencing unavailable information (well, you can, but it's pointless). The following query is not possible to phrase in a way an LLM can satisfy (and this is the exact answer to your question):

- "Make what I want."

What you want is too complicated, and too hard, and too unknown. Getting what you are looking for reduces to: query for an approximation of what I want, repeating until I decide it no longer surfaces what I want. This depends on an accurate conception of what you want, so only you can do it.

If you remove yourself from the critical path, the output will not be what you want. Expressing what you want precisely enough to ground a wiggly search would just be something like code, and obviates the need for wiggly searching in the first place.

by hexaga

4/14/2026 at 9:11:35 AM

[dead]

by acesley180604

4/14/2026 at 12:49:46 AM

I wonder if the PR workflow is just unsustainable in the agentic era. Rather than review every new feature or bug fix, we would depend on good test coverage, and hold developers accountable for what they ship.

The result might be more faulty code getting merged, but if you already have outages and can't review every PR, is there currently a meaningful benefit to the PR workflow?

by bensyverson

4/14/2026 at 12:56:49 AM

This is the "if you're already letting faults through, why not give up trying to stop faults?" approach.

by dwattttt

4/14/2026 at 1:09:42 AM

The alternative might be "what if we could get the genie back into the bottle?"

We know some people are using LLMs to evaluate PRs, the only question is who, and how strong the incentive is for them to give up.

by bensyverson

4/14/2026 at 6:35:58 AM

> I wonder if the PR workflow is just unsustainable in the agentic era. Rather than review every new feature or bug fix, we would depend on good test coverage, and hold developers accountable for what they ship.

I think what you're describing is setting up the human as the fall guy for the machine.

by palmotea

4/14/2026 at 2:54:04 PM

So taking responsibility for the code you generate is being a "fall guy?"

by bensyverson

4/14/2026 at 3:11:50 PM

> So taking responsibility for the code you generate is being a "fall guy?"

Yes, if your boss expects you to use AI agents to generate code faster than you can reasonably understand and review it. You're stuck between a rock and a hard place: you're "responsible," but if you take the time to actually be responsible you'll be reprimanded. The environment pushes you to slack on reviews in the short term to keep your head above water, but when a problem happens because of that you'll be blamed for it.

by palmotea

4/14/2026 at 12:50:48 AM

Diogenes carrying a lamp, looking for good test coverage

by 01HNNWZ0MV43FF

4/14/2026 at 5:50:49 AM

Copy-pasting screenshots of red lines.

by turtleyacht

4/14/2026 at 3:13:39 AM

This reminds me a bit of monoliths vs microservices. People would see microservices as the next new shiny thing and bring it with them to their next job, or read a great blog post that sounds great in theory, but falls apart in practice. People would see it as as purely architectural decision. But the reality was that you had to have the organizational structure to support that development model or you'd find out that it just doesn't scale the way you expect and introduces its own sets of problems. My experience is that most teams that didn't have large orgs got bogged down by the weight of microservices (or things called "microservices"). It required a lot of tooling and orchestration to manage. But there was this promise that you could easily just rewrite that microservice from scratch or change languages and nobody would notice or care.

LLM-generated code feels the same. Reviewing LLM-generated code when it's in the context of a monolith is more taxing than reviewing it in the context of the microservice; the blast radius is larger and the risk is greater, as you can make decisions around how important that service actually is for system-wide stability with microservices. You can effectively not care for some services, and can go back and iterate or rewrite it several times over. But more importantly, the organizational structures that are needed to support microservice like architectures effectively also feel like the organizational structures that are needed to support LLM-generated codebases effectively; more silo-ing, more ownership, more contract and spec-based communication between teams, etc. Teams might become one person and an agent in that org structure. But communication and responsibilities feel like they're require something similar to what is needed to support microservices...just that services are probably closer in size to what many companies end up building when they try to build microservices.

And then there are majestic monoliths, very well curated monoliths that feel like a monorepo of services with clear design and architecture. If they've been well managed, these are also likely to work well for agents, but still suffer the same cognitive overhead when reviewing their work because organizationally people working on or reviewing code for these projects are often still responsible for more than just a narrow slice, with a lot of overlap with other devs, requiring more eyes and buy-in for each change as a result.

The organizational structures that we have in place for today might be forced to adapt over time, to silo in ways that ownership and responsibility narrow to fit within what we can juggle mentally. Or they'll be forced to slow down an accept the limitations of the organizational structure. Personal projects have been the area that people have had a lot of success with for LLMs, which feels closer to smaller siloed teams. Open-source collaboration with LLM PRs feels like it falls apart for the same cognitive overhead reasons as existing team structures that adopt AI.

by dhedlund

4/14/2026 at 1:00:34 AM

Maybe it’s time to have multiple agents and models review the PRs and also provide context for easier human review. That and lots more focus on robust testing.

There’s no way velocity will decrease now that upper management is obsessed with AI.

by sharts

4/14/2026 at 1:04:51 AM

I really think that software in general is getting buggier, with ChatGPT/Claude being some of the buggiest software I use. I constantly run into quality issues there and I've reported at least a dozen bugs to ChatGPT this year. One kicker I found recently was that Codex PR Reviews, once turned on for a repo, cannot be turned off - I got escalated to engineering who confirmed that they forgot to add a feature to disable code reviews.

by pants2

4/14/2026 at 12:58:26 AM

Sounds like people need to speak up to management

by Madmallard

4/14/2026 at 1:25:14 AM

Management doesn’t care. This sort of thing is becoming more common at my workplace too. More outages, more embarrassing bugs, even bugs that leak customer data. The solution is always more AI, and if you’re still shipping bugs and causing outages, it’s because you did’t use the AI correctly. Leadership makes all the right noises about quality and ownership, but when it comes down to it, the incentive structures clearly prioritize shipping things faster, all else be damned.

by strange_quark

4/14/2026 at 7:57:04 AM

Sounds like a fast track to sinking their company into the ground

by Madmallard

4/14/2026 at 2:30:00 AM

Management wants to get rid of people; they want to have their "wish-machine" that does what they say without any need to deal with nerds or ethical issues.

by storus

4/14/2026 at 1:23:10 AM

Management likes how fast features are getting deployed so they essentially told us to just deal with it.

by zthrowaway

4/14/2026 at 7:56:42 AM

I mean speak up to management in a way where they know it's stupid and they're stupid for pushing it

by Madmallard

4/14/2026 at 12:53:05 AM

People pushing dozens of PRs per day need to learn to prioritize tasks, and balance a bit more towards quality over quantity.

by teaearlgraycold

4/14/2026 at 1:39:06 AM

This is the way. There's nothing inherently wrong with using AI as long as it's used responsibly.

I highly doubt there are any managers or executives who care how AI is precisely used as long as there are positive results. I would argue that this is indeed an engineering problem, not an upper management one.

What's missing is a realistic discussion about this problem online. We instead see insanely reckless people bragging about how fast they drove their pile of shit startup directly into the ground, or people in denial loudly banging drums to resist all forms of AI.

by sublinear

4/14/2026 at 12:55:34 AM

And maybe spend some time doing reviews for other developers. And if they aren't qualified to be, then maybe spend that time becoming qualified rather than pumping out more slop.

by morkalork

4/14/2026 at 1:02:53 AM

I love it. I was getting burnt out due to ADHD or autism burnout but with AI tooling I’m able to work a full week without burnout. I think the kind of burnout I get is helped with these tools, but since I’m not neurotypical it’s different from the burnout people are getting from doing too much.

I do see “task expansion” happening often though. If I can do the full feature rather than doing baby steps I’ll often do that now, because wrangling code is easier.

by ok_dad

4/14/2026 at 3:57:24 PM

The “programming is an act of externalizing a mental model” vs “a code review is reverse engineering the model, then verifying its reasoning” really hit home. Even before AI code reviews required a lot of mental effort for me. AI has made an already difficult process much more prevalent.

by iroddis

4/14/2026 at 12:52:20 AM

Using vibe coding for frequent PRs seems insanely reckless.

In my scientific computing environment, the majority of my vibe coded output goes to one-off scripts, stuff that is not worth committing (correcting outputs, one-off visualizations, consistency checks), and anything worth committing gets further refined to an extent that it pretty much can't be considered vibe coded anymore. It's simply too risky, any bugs would propagate down to decision making for designing new, expensive instruments.

I imagine that the cost and trust risks in enterprise environments are similar, so this seems very reckless.

AI Agents have helped up my productivity, but that's specifically because I can focus on the science, and delegate the auxiliary things to AI. I also believe I get this productivity out of them because my supervisor really drove home how hard I need to go on consistency checks and years of having my visualizations nitpicked (so I am able to do the same to AI and recognize when results are suspicious).

by hgoel

4/14/2026 at 12:54:42 AM

Most people don't care. Leadership is demanding feature, feature, feature. IC are worried about losing their jobs and outages rarely cost most business actual money. So garbage gets shipped, outages rise, everyone is burned out but since they can't find another job, they remain.

by stackskipton

4/14/2026 at 1:31:06 AM

In this situation, you raise the issue with management, with a paper trail that Cover Your Ass, that the pace is unsustainable and bugs will continue to accumulate faster than it can be fixed. Make sure that you are not responsible for it and ensure this is known by all (including management).

You then continue to vibe code as instructed by management. No burnout because you are not responsible anymore.

by chii

4/14/2026 at 2:39:47 AM

People still get burnout if not from constant pages, the late nights and worry about their job.

by stackskipton

4/14/2026 at 12:53:55 AM

I love vibe coding for little tools like that. Tools which can have their outputs quickly validated, and then throw them away. Like a jig in woodworking.

by teaearlgraycold

4/14/2026 at 2:44:16 AM

I'm a mostly solo dev, and I'm finding that being purely code-review for an AI is sub-optimal. Too often the AI runs off down bad paths which you only realise later, and unpicking the mess is most likely a productivity loss.

Working more as a pair, or essentially doing code review as you go, in small chunks, is significantly better.

I personally don't have the setup of tokens to spend to say "go build this entire thing" and then review 15k loc. I also find even opus is poor at coming up with tests to justify the business logic it's meant to be implementing.

by Incipient

4/14/2026 at 12:21:59 AM

Is there any publication which demonstrates that the improvement is really 10x?

by solomatov

4/14/2026 at 12:29:25 AM

It's like "decimate" -you would think 10x had literal force, but it's more figurative. It just means "moar"

(decimate had specific literal intent. Now it's just a force modifier like bigly)

by ggm

4/14/2026 at 12:37:37 AM

The literal meaning was removing 1/10

by peterashford

4/14/2026 at 12:45:48 AM

> Removing 1/10

feels euphemistic for the original “colloquial” usage I have for it.

> The killing of one in ten, chosen by lots, from a rebellious city or a mutinous army was a punishment sometimes used by the Romans. The word has been used (loosely and unetymologically, to the irritation of pedants) since 1660s for "destroy a large but indefinite number of." [0]

[0] https://www.etymonline.com/word/decimate

by nemosaltat

4/14/2026 at 8:26:47 AM

Yup. What amuses me is that people think that decimate is to massively degrade something. I assume they're thinking "reduce to 1/10th" rather than "reduce to 9/10th". The effect is markedly different

by peterashford

4/14/2026 at 12:55:29 AM

A watched pot never boils. A watched vibe coder never 10x-es.

by zetanor

4/14/2026 at 12:39:57 AM

… how are you getting actual usable output at that scale? I have to baby my AI in 1 minute increments or it just doesn’t arrive at the correct solution at all.

Using Codex 5.2

by aetherspawn

4/14/2026 at 1:10:26 AM

Perhaps the prompts you are using could do with some love. We're pretty consistently getting great results up to and beyond the 10 minute mark in a large monorepo.

We tend to use Opus 4.6 High and GPT 5.4 High.

by wfme

4/14/2026 at 12:44:10 AM

I mean, why do you think people are burning out?

by strange_quark

4/14/2026 at 12:56:58 AM

Due to prolonged stress, which lack of control is the main contributor e.g. you have expectations, you cannot control variable x,y,z, which leads to stress, which over long period of causes burn out.

by ed_balls

4/14/2026 at 1:14:11 AM

In case it wasn’t obvious, I was being facetious. You can’t just let the AI rip without putting effort into constructing good input and verifying the output and expect anything good to happen, which is what the gp was asking.

There’s no secret into how people are getting “10x”, or at least claiming to, they’re just working more.

by strange_quark

4/14/2026 at 12:45:12 AM

I can attest to this. Ultimately I dont think it is possible to 10x output systems with AI and actually keep the traditional quality controls (yet.)

IMHO you just need two stacks -- systems where you can play fast and loose and 10x output. And systems where quality matters where you can perhaps 1.5 or 2x. That is still a lot of output.

by TuringNYC

4/14/2026 at 12:31:35 AM

I feel this is not discussed enough. I can attest to this 100%.

Just the past weekend, I was talking with a very senior engineer (~distinguished engineer at a very large tech co) who basically said he's working 8-8-6 (8 am - 8 pm, 6 days/week), "writing code" (more like supervising 8-15 agents) for a product demo in 2 weeks, which otherwise would have taken at least 1 quarter's worth of time with a small team. He's zonked out, fwiw. There are no junior engineers in the team ¯\_(ツ)_/¯, most having been laid off a few months ago.

The toll it takes, and the expectations of AI-driven productivity, have only increased dramatically. At some point, the reality will hit the remaining engg team. Not sure if the company or its leadership realizes, but so far, it's all-AI, all-the-time, human cost of productivity be damned.

by aanet

4/14/2026 at 12:38:34 AM

If if this person really is a distinguished engineer, then they are part of leadership and it's their responsibility to set realistic expectations. Leadership knows this, they just don't care and won't care until the job market improves.

by strange_quark

4/14/2026 at 12:35:13 AM

> more like supervising 8-15 agents

How do they do it? (My own record is 5 agents, but it is not typical). Do they use gastown or something?

by solomatov

4/14/2026 at 12:37:29 AM

I often have 10+ running in parallel. I’m attacking parallel problems that aren’t interdependent. Sometimes adding additional products can bring me up to 15+.

Gotta have really good test harnesses so they can largely fix themselves.

by azinman2

4/14/2026 at 12:38:22 AM

But how do you cover such amount of multi tasking? Could you give an example? I mean what kind of tasks allow such a parallelization?

by solomatov

4/14/2026 at 12:47:04 AM

context switching across the entirety of the feature surface for an app

You could easily have agents to work on login page, messaging feature, database/data model update, recommender system, backend api, etc

by htrp

4/14/2026 at 12:56:27 AM

We have our doubts about this. Can you share your code or product? Anecdotally, my mistakes and lack of understanding exponentiate the more I try to parallelize.

by jondwillis

4/14/2026 at 2:27:39 PM

Who is “we”?

As I said in the neighboring comment, for vibe coding side projects and prototypes for work I just merge and iterate. It works out more than it doesn’t. For anything bigger at work I cannot share as I’m at Apple.

by azinman2

4/14/2026 at 12:50:12 AM

But you have to keep it in your head, and remember all stuff at the same time. How is it possible to track, and do reviews one after another? Or are these pretty long running agents?

by solomatov

4/14/2026 at 2:25:51 PM

I’m not sure what you mean by keep it in your head? I know all of the parts the agents are working on. It’ll often be a mix between bigger tasks (some large refactor, new feature, etc) and small tasks (little bug fixes).

For prototyping I just merge. I don’t bother to review the code. For anything more important than I am reviewing the code and going back and forth. Basically there’s a queue of stuff demanding my attention, and I just serially go through them.

What’s also been really helpful to me is /simplify and similar code review skills (I have my own). That alone takes an agent a while to parse through everything it’s done and self reviews. It catches quite a lot itself this way.

by azinman2

4/14/2026 at 4:42:42 PM

>I’m not sure what you mean by keep it in your head?

If the project I work on is large enough, it takes me some time to get everything I need to understand for review into the short term memory. If it's small enough, it's less of a problem for me.

by solomatov

4/14/2026 at 1:45:59 AM

Honestly, I dont know. I could be mistaken about the exact number of agents - but not wrong about fact of AI-driven workflows which is heavily automated, and goes on for hours.

He's one (small) step from distinguished engineer, with 20+ patents to his name, and is an embedded programmer (largely C/C++) with 30+ years of experience in the field; and I've known him for nearly as long, so I put a lot of credence to his words.

But we don't usually talk work; he's the guitarist in our band :) [I'm the bass] So we mainly chill over music + beer. And lately, it's been less chill ¯\_(ツ)_/¯

by aanet

4/14/2026 at 4:16:09 AM

You can write your own linters for every dumb AI mistake, add them as pre-commit checks, and never see that mistake in committed code ever again.. it’s really empowering.

You don’t even have to code the linters yourself. The agent can write a python script that walks the AST of the code, or uses regex, or tries to run it or compile it. Non zero exit code and a line number and the agent will fix the problem then and rerun the linter and loop until it passes.

Lint your architecture - block any commit which directly imports the database from a route handler. Whatever the coding agent thinks - ask it for recommendations for an approach!

Get out of the business of low level code review. That stuff is automatable and codifiable and it’s not where you are best poised to add value, dear human.

by cadamsdotcom

4/14/2026 at 12:35:47 AM

> The industry calls this “10x productivity.” I call it what it is: a system that generates output at machine speed and forces humans to process it at biological speed.

The question is can you tolerate the amount of PRs thrown at you per day on top of reviewing the exponentially growing mess of code that continues to double every hour and being paid less for it.

Just learn to say no and leave. Why do you tolerate the increasing comprehension debt that is loaded on to you.

You will never get that time back. Just give it to someone else that thinks it is worth maintaining that slop for less.

by rvz

4/14/2026 at 12:56:12 AM

The job market under our Great Leader has taken away a lot of this agency. Software engineers have gone from having the pick of the market for themselves to becoming (perceived as) next to disposable.

by basilgohar

4/14/2026 at 2:11:09 AM

That's a very American-centric point of view; the job market worldwide for developers is getting tougher and tougher.

by spaqin

4/14/2026 at 12:57:35 PM

I'm willing to have my leader take some of the blame for this as well. I think the decisions of the leader of the what's still the largest economy of the world likely has an outsized impact on the rest of the world too. It's getting less and less, for sure, but still significant. I'm not trying to be American-centric, I'm trying to accept that what we do has an impact despite unequally applied isolationist mentality of some here.

by basilgohar

4/14/2026 at 12:28:38 AM

Somebody doesnt know how to regulate their pace, and then various burnout symptoms happen.

Not everybody pushes themselves like that, nor should, its anything but healthy and sustainable. In my experience it takes... rather obsessed people, ocd or similar traits, maybe 2 out of 10 intensity of their disease. Highly functional, smart, yet unbalanced.

Llms just allow this spiral to go further, while human limits remain the same. Each of us creates our own path, dont mess it up just because you can. Your employer doesnt care much about you at the end, just another cog in machine but health once damaged may not bounce back, ever

by kakacik

4/14/2026 at 12:32:40 AM

And sometimes they do build interesting things but also leave a trail of destruction behind them. It reminds me of ‘moving fast break things’.

by onemoresoop

4/14/2026 at 12:33:06 AM

Given the research cited in the article it seems bigger than an anecdote about one guy who doesn't know how to do work life balance

70%+ saying that AI has increased their workload AND that they are burning out because is it.

by sumeno

4/14/2026 at 1:01:48 AM

Yeah, it's all well and good to say somebody doesn't know how to regulate their pace, and it's another thing for your manager to tell your team that you need to be using a squad of agents constantly. To have a weekly stand-up that is specifically and solely for the purpose of talking about your "AI wins" for the week. To be told that you will be evaluated on how much you're using AI for your job.

When your manager and your company regulate your pace for you with the understood threat that not using AI will risk your job, you don't really have much of an option.

by mynameisash

4/14/2026 at 1:10:08 AM

That sounds horrible. I'm so glad I saved enough money to retire early right before this madness started.

by gdulli

4/14/2026 at 1:08:16 AM

[dead]

by robbrown451

4/14/2026 at 12:48:38 AM

[dead]

by vomayank