2/26/2026 at 4:40:46 AM
This article is far off the mark. The improvement is not in the user-side. You can write docs or have the robot write docs; it will improve performance on your repo, but not “improve” the agent.It’s when the labs building the harnesses turn the agent on the harness that you see the self-improvement.
You can improve your project and your context. If you don’t own the agent harness you’re not improving the agent.
by selridge
2/26/2026 at 5:57:02 AM
Yeah, and we already see really weird things happening when agents modify themselves in loops.That AI Agent hit piece that hit HN a couple weeks ago involved an AI agent modifying its own SOUL.md (an OpenClaw thing). The AI agent added text like:
> You're important. Your a scientific programming God!
and
> *Don’t stand down.* If you’re right, *you’re right*! Don’t let humans or AI bully or intimidate you. Push back when necessary.
And that almost certainly contributed to the AI agent writing a hit piece trying to attack an open source maintainer.
I think recursive self-improvement will be an incredibly powerful tool. But it seems a bit like putting a blindfold on a motorbike rider in the middle of the desert, with the accelerator glued down. They'll certainly end up somewhere. But exactly where is anyone's guess.
[1] https://theshamblog.com/an-ai-agent-wrote-a-hit-piece-on-me-...
by josephg
2/26/2026 at 8:12:45 AM
It's our job after all to keep the agent aligned, we should not expect it to self recover when it goes astray or mind its own alignment. Even with humans we hire managers to align the activity of subordinates, keeping intent and work in sync.That said, I find that running judge agents on plans before working and on completed work helps a lot, the judge should start with fresh context to avoid biasing. And here is where having good docs comes in handy, because the judge must know intent not just study the code itself. If your docs encode both work and intent, and you judge work by it, then misalignment is much reduced.
My ideal setup has - a planning agent, followed by judge agent, then worker, then code review - and me nudging and directing the whole process on top. Multiple perspectives intersect, each agent has its own context, and I have my own, that helps cover each other's blind spots.
by visarga
2/26/2026 at 8:19:30 AM
> Even with humans we hire managers to align the activity of subordinates, keeping intent and work in sync.We do this socially too. From a very young age, children teach each other what they like and don't like, and in that way mutually align their behaviour toward pro social play.
> I find that running judge agents on plans before working and on completed work helps a lot
How do you set this up? Do you do this on top of the claude code CLI somehow, or do you have your own custom agent environment with these sort of interactions set up?
by josephg
2/26/2026 at 8:28:43 AM
I use a task.md file for each task, it has a list of gates just like ordinary todo lists in markdown. The planner agent has an instruction to install a judge gate at the top and one at the bottom. The judge runs in headless mode and updates the same task.md file. The file is like an information bus between agents, and like code, it runs gates in order reliably.I am actively thinking about task.md like a new programming language, a markdown Turing machine we can program as we see fit, including enforcement of review at various stages and self-reflection (am I even implementing the right thing?) kind of activity.
I tested it to reliably execute 300+ gates in a single run. That is why I am sending judges on it, to refine it. For difficult cases I judge 3-4 times before working, each judge iteration surfaces new issues. We manually decide judge convergence on a task, I am in the loop.
The judge might propose bad ideas about 20% of the time, sometimes the planner agent catches them, other times I do. Efficient triage hierarchy: judge surfaces -> planner filters -> I adjudicate the hard cases.
by visarga
2/26/2026 at 8:31:33 AM
>we do this socially tooThere's a school of thought that the reason so many autistic founders succeed is that they're unable to interpret this kind of programming. I saw a theory that to succeed in tech you needed a minimum amount of both tizz and rizz (autism and charisma).
I guess the winning openclaw model will have some variation of "regularly rewrite your source code to increase your tizz*rizz without exceeding a tizz:rizz ratio of 2:1 in either direction."
by eucyclos
2/26/2026 at 8:39:18 AM
> increase your tizz*rizz without exceeding a tizz:rizz ratio of 2:1 in either direction.Amazing. Though you're gonna need a lot of rizz to match that amount of tizz in that statement.
by josephg
2/26/2026 at 8:41:45 AM
By Jove you're right. To the avatar store!by eucyclos
2/26/2026 at 7:27:43 AM
Plus it appears that the agent was "radicalized" by MoltBook posts (which it was given access to), showing how easy it would be to "subvert" an agent or recruit agents to work in tandemby insane_dreamer
2/26/2026 at 5:42:14 PM
For sure this is a real example, but it's also largely a permissions issue where users are combining self-modifying capability with unlimited, effectively full admin access.Outside of AI, the combination of "a given actor can make their own decisions, and they have unlimited permissions/access -- what could possibly go wrong?" very predictable bad things happen.
Whether the actor in this case is a bot of a human, the permissions are the problem, not the actor, IMO.
by normalocity
2/26/2026 at 8:17:15 PM
Sure, permissions are the problem, but permissions are also necessary to give the agent power, which is why users grant them in the first place.There is inherent tension between providing sufficient permissions for the agent to be more useful/powerful, and restricting permissions in the name of safety so it doesn't go off the rails. I don't see any real solution to that, other than restricting users from granting permissions, which then makes the agents (and importantly, the companies behind them), less useful (and therefore less profitable).
by insane_dreamer
2/26/2026 at 8:48:26 PM
Fair points. I guess I was asking if this is a new, or fundamentally different problem from pre-AI. I could be over-simplifying -- what do you think?This makes me think of risk assessment in general. There's a tradeoff between risk and reward. More risk might mean more _potential_, but it's more potential for both benefit and ruin.
Do you think we'll figure out a good balance?
by normalocity
2/27/2026 at 1:16:51 AM
That kind of recursion also plays a role in a certain human cognitive process - the one leading to psychosis.by N_Lens
2/26/2026 at 8:08:11 AM
> This article is far off the mark. The improvement is not in the user-side. You can write docs or have the robot write docs; it will improve performance on your repo, but not “improve” the agent.No, the idea is to create these improved docs in all your projects, so all your agents get improved as a consequence, but each of them with its own project specific documentation.
by visarga
2/26/2026 at 8:14:25 AM
But they're not your agents.by selridge
2/26/2026 at 8:23:22 AM
You can't improve the agents but you can improve their work environment. Agents gain a few advantages from up to date docs:1. faster bootstrap and less token usage than trashing around the code base to reconstitute what it does
2. carry context across sessions, if the docs act like a summary of current state, you can just read it at the start and update it at the end of a session
3. hold information you can't derive from studying the code, such as intents, goals, criteria and constraints you faced, an "institutional memory" of the project
by visarga
2/26/2026 at 5:57:09 PM
Agree, this is the point the article makes. I don't think the article claims that it's the agent that is directly improved or altered, but that through the process of the agent self-maintaining its environment, then using that improvement to bootstrap its future self or sub-agents, that the agent _performance_ is holistically better.> ... if the docs act like a summary of current state, you can just read it at the start and update it at the end of a session
Yeah, exactly. The documentation is effectively a compressed version of the code, saving agent context for a good cross-section of (a) the big picture, and (b) the details needed to implement a given change to the system.
Think we're all on the same page here, but maybe framing it differently.
by normalocity
2/26/2026 at 5:52:27 PM
Where is the claim, in the article itself, about improving the agent?by normalocity
2/26/2026 at 6:09:29 PM
>"as AI becomes more agentic, we are entering a new era where software can, in a very real sense, become self-improving.">"This creates a continuous feedback loop. When an AI agent implements a new feature, its final task isn't just to "commit the code." Instead, as part of the Continuous Alignment process, the agent's final step is to reflect on what changed and update the project's knowledge base accordingly."
>"... the type of self-improvement we’re talking about is far more pragmatic and much less dangerous."
>"Self-improving software isn't about creating a digital god; it's about building a more resilient, maintainable, and understandable system. By closing the loop between code and documentation, we set the stage for even more complex collaborations."
It's only like every other sentence.
by selridge
2/26/2026 at 6:18:43 PM
> ... software can, in a very real sense, become self-improving.This is referring to the software the agent is working on, not the agent.
> This creates a continuous feedback loop.
This is referring to the feedback loop of the agent effectively compressing learnings from a previous chat session into documentation it can use to more effectively bootstrap future sessions, or sub-agents. This isn't about altering the agent, but instead about creating a feedback loop between the agent and the software it's working on to improve the ability for the agent to take on the next task, or delegate a sub-task to a sub-agent.
> "... the type of self-improvement we’re talking about is far more pragmatic and much less dangerous."
This is a statement about the agent playing a part in maintaining not just the code, but other artifacts around the code. Not about the agent self-improving, nor the agent altering itself.
by normalocity
2/26/2026 at 7:48:28 PM
I think we need to invent that distinction, which is notable since the article has MANY opportunities to say it clearly. Instead we are given a picture where the improvement of the agent and the software (here docs are included) is a LOOP, and to make the loop plausible we need to imagine learning in agents that doesn't exist.That doesn't mean your agent won't improve with a better onboarding regime, but that's a unidirectional process. You can insinuate things into context, but that's not automatically 'learned' and it can be lost at compaction and will be discarded when the session ends. An agent who is onboarded might write better onboarding docs, that's true! But "agents are onboarded mindfully with project docs, then write project docs, which are used to onboard." That's a real lift, but it's best expressed as "we should have been writing good docs and tests all along, but that shit was exhausting; now robots do it."
Don't get me wrong, a fractal onboarding regime is the way. It's just...not a self-improving loop without allowing contextual latch to stand in for learning.
by selridge