2/25/2026 at 12:26:06 AM
> When surveyed, 30% to 50% of developers told us that they were choosing not to submit some tasks because they did not want to do them without AI. This implies we are systematically missing tasks which have high expected uplift from AI.In fact, one of the developers in the original study later revealed on Twitter that he had already done exactly that during the study, i.e. filtered out tasks he prefered not to do without AI: https://xcancel.com/ruben_bloom/status/1943536052037390531
While this was only one developer (that we know of), given the N was 16 and he seems to have been one of the more AI-experienced devs, this could have had a non-trivial effect on the results.
The original study gets a lot of air-time from AI naysayers, let's see how much this follow-up gets ;-)
by keeda
2/25/2026 at 12:41:06 AM
> 3. Regarding me specifically, I work on the LessWrong codebase which is technically open-source. I feel like calling myself an "open-source developer" has the wrong connotations, and makes it more sound like I contribute to a highly-used Python library or something as an upper-tier developer which I'm notThat’s very interesting! This kinda matches what I see at work:
- low performers love it. it really does make them output more (which includes bugs, etc. it’s causing some contention that’s yet to be resolved)
- some high performers love it. these were guys who are more into greenfield stuff and ok with 90% good. very smart, but just not interested in anything outside of going fast
- everyone else seems to be finding use out of it, but reviews are painful
by sjaiisba
2/25/2026 at 3:52:52 AM
As one of the naysayers who talked a lot about the original study, I enthusiastically endorse any attempt at all to actually measure AI productivity. An increase from 20% slowdown to 20% speedup over the past year seems broadly consistent with my understanding of how things have gone. I think I remain classified as a "naysayer", though, because the "booster" case has gone from "I'm multiple times more productive" to "I never have to look at code my AI agents just handle everything" over the same period.by SpicyLemonZest
2/25/2026 at 7:03:44 AM
I think the issue was with incomplete context. Even before the original METR study came out, there were a number of larger-scale studies that showed a 15 - 30% boost, starting as far back as 2024. I often mention them, though they require some explanation, so this thread and linked comments may be useful: https://news.ycombinator.com/item?id=46559254However those studies never got as much airtime as the METR study, and this has created an imbalanced perspective.
My take is that studies like this are extremely useful, but a lagging indicator of the true extent of AI-assisted coding. Especially since the latest tools are something else entirely.
I am not at the "never look at code again" stage, the old habits are just too ingrained... but I'm starting to look less frequently because I rarely find anything to fix. I can see a path from where I'm at to the outlandish claims people have been making.
by keeda
2/25/2026 at 7:41:44 AM
I tried the "don't look too closely" thing for the first time last week. I got immediately humiliated when a reviewer asked why my commit was trying to replace the correct, elegant usage of an API the class was named after with a 4-line long franken-command using a different API with incorrect semantics. It's not like I'm not trying the new stuff, on a subjective level I think AI coding is really neat, but I just can't ever figure out how to map what I get to the stories I hear.by SpicyLemonZest
2/25/2026 at 7:12:54 PM
Oh yeah I can see that happening, which is why I still scan the code! However, one thing I'll add is that AI-assisted coding requires adapting your workflow. Fortunately, it largely boils down to coding best-practices on steroids: docs, tests, tooling like linters, etc.I throw tests at everything, even minor functions, preferably integration, maybe even some E2E with Playwright in web apps, at least for the happy paths. I actually pay more attention to the tests. The amazing thing is that the AI writes all of these and uses them as feedback to fix its mistakes.
But these validation guardrails are what has been driving down the issues I encounter. Without these the AI can make mistakes, and hence will require more in-depth manual review.
by keeda
2/25/2026 at 10:35:43 AM
It depends what you're measuring.Don't get me wrong, my experiments with true-vibe-coding (i.e. don't even look at the code) are as yours, that the result is somewhat mediocre*.
For some cases, and I try to push beyond the limits of what LLMs can do in order to find those limits, they suck. I'd describe the output as like that of an overenthusiastic junior who reinvents the wheel badly rather than using standard approaches even when told to.
For other cases, I know that mediocre code is actually good enough: well before LLMs happened, I've seen mediocre code that still resulted in the app itself being given meaningful public accolades.
* Though, as per previous comment of mine, I can't help notice that the mediocrity is doing more and more of my previous career: https://news.ycombinator.com/item?id=46989102
by ben_w
2/25/2026 at 8:14:10 AM
You just have to give up and drink the koolaid...But for real... My company started tracking commits per hour as a metric so I just commit as many times as I can. I don't get the luxury of even looking at my work now. They say it's faster but I've never seen so much tech debt delivered so quickly in my life.
Its going to be an interesting few years...
by pudsbuds
2/25/2026 at 8:34:47 AM
Definitely need to stop squashing commits if that is the case! But no, seriously tracking git commit counts is absolutely ridiculous. Maybe you can have AI autonomously work on useless documentation that no one will read, with 1 commit per 100 lines of markdown?by mewpmewp2