7/5/2026 at 8:03:52 PM
LLM agents have plenty of "bad habits" that are impossible to get rid of. I suspect they're a side effect of reinforcement learning. Training objective rewards fewer tokens, so the results just need to be good enough most of the time while cutting as many corners as possible.Similarly, I'm trying to stop agents "gracefully" handling errors by stuffing results with empty junk and continuing (get_list_of_problems().unwrap_or_default() -> "no problems found!"). I've filled AGENTS.md with "fail closed", "extremely strict error handling", "no fallbacks", "don't use sentinel values", and hundreds of variations of these, but they work about as well as "do not hallucinate". I get "You're absolutely right, this will cause problems!" and the fix is "changed to Err(_) => String::new()", I suspect it's another case of gaming RL - failing early and loudly increases the chance of failing and being penalized. So fudging data, ignoring errors, and presenting a barely-working result is a better strategy overall. When it fails, it fails anyway, but as long as it stumbles to the finish line it has a non-zero chance of getting accepted by the RL judge.
by pornel