3/12/2026 at 1:04:03 PM
I feel that two things are true at the same time:1) Something happened during 2025 that made the models (or crucially, the wrapping terminal-based apps like Claude Code or Codex) much better. I only type in the terminal anymore.
2) The quality of the code is still quite often terrible. Quadruple-nested control flow abounds. Software architecture in rather small scopes is unsound. People say AI is “good at front end” but I see the worst kind of atrocities there (a few days ago Codex 5.3 tried to inject a massive HTML element with a CSS before hack, rather than proprerly refactoring markup)
Two forces feel true simultaneously but in permanent tension. I still cannot make out my mind and see the synthesis in the dialectic, where this is truly going, if we’re meaningfully moving forward or mostly moving in circles.
by aerhardt
3/13/2026 at 10:08:10 AM
> People say AI is “good at front end” but I see the worst kind of atrocities thereIt's commonly universal to say "AI is great in X", where one is not professional in X. It's because that's how AI is designed: to output tokens according to stats, not logic, not semantic, and not meaning: stats.
by zx8080
3/13/2026 at 6:28:46 PM
Reading discussions online and comparing them to my own experience makes me feel crazy, because I've found today's LLMs and agents to be seemingly good at everything except writing code. Including everything else in software engineering around code (debugging, reviewing, reading code, brainstorming architecture, etc.) as well as discussing various questions in the humanities and sciences where I'm a dilettante. But whenever I've asked them to generate any substantial amount of code, beyond a few lines to demonstrate usage of some API I'm unfamiliar with, the results have always been terrible and I end up either throwing it out or rewriting almost all of it myself and spending more time than if I'd just written it myself from the start.It's occurred to me that maybe this just shows that I'm better at writing code and/or worse at everything else than I'd realized.
by contextfree
3/14/2026 at 5:34:58 PM
Gell-Mann Amnesia for code quality.by pornel
3/13/2026 at 10:06:16 AM
This matches my experience too. The models write code that would never pass a review normally. Mega functions, "copy and pasted" code with small changes, deep nested conditionals and loops. All the stuff we've spent a lot of time trying to minimise!You could argue it's OK because a model can always fix it later. But the problem comes when there's subtle logic bugs and its basically impossible to understand. Or fixing the bug in one place doesn't fix it in the 10 other places almost the same code exists.
I strongly suspect that LLMs, like all technologies, are going to follow an S curve of capability. The question is where in that S curve we are right now.
by leoedin
3/12/2026 at 1:44:58 PM
The models lose the ability to inject subtle and nuance stuff as they scale up, is what I’ve observed.by jygg4
3/12/2026 at 1:08:57 PM
> People say AI is “good at front end”I only say that because I'm a shit frontend dev. Honestly, I'm not that bad anymore, but I'm still shit, and the AI will probably generate better code than I will.
by orwin
3/12/2026 at 1:55:42 PM
As long as humans are needed to review code, it sounds your role evolves toward prompting and reviewing.Which is akin to driving a car - the motor vehicle itself doesn’t know where to go. It requires you to prompt via steering and braking etc, and then to review what is happening in response.
That’s not necessarily a bad thing - reviewing code ultimately matters most. As long as what is produced is more often than not correct and legible.. now this is a different issue for which there isn’t a consensus across software engineer’s.
by jygg4
3/13/2026 at 7:06:51 AM
I don't think that reviewing code is so important as reviewing results. Nobody is reviewing the IL or assembly code when they write in higher level languages. It's the end result that matters in most cases.by cicko
3/13/2026 at 8:13:25 AM
But we don't evolve IL or assembly code as the system evolves. We regenerate it from scratch every time.It is therefore not important whether some intermediate version of that low-level code was completely impossible to understand.
It is not so with LLM-written high-level code. More often than not, it does need to be understood and maintained by someone or something.
These days, I mainly focus on two things in LLM code reviews:
1. Making sure unit tests have good coverage of expected behaviours.
2. Making sure the model is making sound architectural decisions, to avoid accumulating tech debt that'll need to be paid back later. It's very hard to check this with unit tests.
by aix1
3/13/2026 at 6:00:17 PM
We get stuck reviewing the output assembly when it's broken, and that does happen from time to time. The reason that doesn't happen often is that generation of assembly follows strict rules, which people have tried their best to test. That's not the behavior we're going to get out of a LLM.by nitwit005
3/13/2026 at 6:20:19 PM
Yes, prompts aren't analogous to higher-level code, they're analogous to wizards or something like that which were always rightly viewed with suspicion.by contextfree
3/13/2026 at 7:39:34 AM
But those are close to deterministic.by rienbdj
3/13/2026 at 3:51:38 AM
> 1) Something happened during 2025 that made the models (or crucially, the wrapping terminal-based apps like Claude Code or Codex) much better. I only type in the terminal anymore.I have heard say that the change was better context management and compression.
by naruhodo
3/13/2026 at 4:46:05 AM
A lot of enhancements came on the model side which in many ways enabled context engineering.200k and now 1M contexts. Better context management was enabled by improvements in structured outputs/tool calling at the model level. Also reasoning models really upped the game “plan” mode wouldn’t work well without them.
by bbatha