alt.hn

5/13/2026 at 7:11:10 PM

Continual Harness: Online Adaptation for Self-Improving Foundation Agents

https://arxiv.org/abs/2605.09998

by milkkarten

5/13/2026 at 7:11:10 PM

Author here. TL;DR:

Long-horizon embodied agency is a harness problem, not a model-scale problem. Coding agents like Claude Code work because of scaffolding (prompt, skills, memory, sub-agents) around the model. Embodied agents haven't had an equivalent.

Gemini Plays Pokémon (GPP) became the first AI to complete Pokémon Blue, Yellow Legacy on hard mode, and Crystal without a lost battle via iterative harness refinement. Early on a human edited the harness. By Crystal the model was doing it itself by naming its own strategies, writing truth tables for puzzles, wrapping loopholes into reusable primitives.

Continual Harness automates this fully. Starting from a raw interface with no curated knowledge, every F steps a Refiner reads the recent trajectory and applies edits to the prompt, sub-agents, skills, and memory -- no resets. It closes most of the gap to a hand-engineered expert harness from scratch.

Our key findings: (1) Iterative harness refinement closes most of the gap to a hand-engineered version. (2) Long-horizon agency requires self-refinement, and self-refinement requires a useful model. (3) The future of agents is model-harness co-learning.

Demos: https://sethkarten.ai/continual-harness

by milkkarten