2/19/2026 at 8:14:26 PM
This is self-reported productivity, in that devs are saying AI saves them about 4 hours per week. But let’s not forget the METR study that found a 20% increase in self-reported productivity but a 19% decrease in actual measured productivity.(It used a clever and rigorous technique for measuring productivity differences, BTW, for anyone as skeptical of productivity measures as I am.)
by jdlshore
2/19/2026 at 9:10:04 PM
Let's also not forget the multiple other studies that found significant boosts to productivity using rigorous methods like RCTs.However, because these threads always go the same way whenever I post this, I'll link to a previous thread in hopes of preempting the same comments and advancing the discussion! https://news.ycombinator.com/item?id=46559254
Also, DX (whose CTO was giving the presentation) actually collects telemetry-based metrics (PR's etc.) as well: https://getdx.com/uploads/ai-measurement-framework.pdf
It's not clear from TFA if these savings are self-reported or from DX metrics.
by keeda
2/19/2026 at 8:39:45 PM
https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o...That info is from mid 2025, talking about models released in Oct 2024 and Feb 2025. It predates tools like Claude Code and Codex, Lovable was 1/3 current ARR, etc.
This might still be true but we desperately need new data.
by samuelknight
2/19/2026 at 8:54:57 PM
None of those changes address the issue jdlshore is pointing out: self assessed developers productivity increases from LLMs are not a reliable indication of actual productivity increases. It's true that modern LLMs might have less of a negative impact on productivity or increase it, but you won't be able to tell by asking developers if they feel more productive.(Also, Anthropic released Claude Code in Febuary of 2025, which was near the start of the period the study ran).
by lunar_mycroft
2/19/2026 at 8:56:41 PM
Yeah new data would be great, but i feel like these tools are not substantively better and this is becoming the new "its different this time!"by monkaiju
2/19/2026 at 8:38:30 PM
Has the METR study been replicated?by williamcotton
2/19/2026 at 8:59:19 PM
Not a scientific study, but someone did replicate the experiment on themselves [0] and found that in their case, any effect from LLM use wasn't detectable in their sample. Notably they almost certainly had more experience with LLMs than most of the METR participants did.[0] https://mikelovesrobots.substack.com/p/wheres-the-shovelware...
by lunar_mycroft
2/19/2026 at 8:43:57 PM
I haven’t heard about any similar studies, no. I’m planning to conduct one at my workplace but we’re still deciding exactly which uses of AI to test.by jdlshore