7/5/2026 at 12:15:22 AM
This is easily solved with good error messages.Claude always gets the syntax wrong on my tool calls.
So I did a revolutionary thing and made the error output print helpful guidance on how to correctly call the tool.
The agent tries again and always gets it right. Total time “wasted”: 1-2 seconds. It happens every session, but it only happens once per context window. After that the agent holds on to the lesson.
To do this for your own tool calls, imagine what you’d do in the agent’s place - what info you’d need so you can correct your mistake. Assume the agent wants to achieve the goal so it’ll try again. These are probabilistic systems, so we need to give them an extra loop to get the deterministic bits right.
by cadamsdotcom
7/5/2026 at 3:39:12 AM
I've been trying to push for this perspective about the error messages of jj vcs. There's some push back from people that don't perceive that making tools work well with LLMs is also making tools work well with humans. (Obviously there's more nuance to the arguments than this one sided perspective).by joshka
7/5/2026 at 2:55:05 AM
This will cause an extra round trip to the LLM. Which means more $ spent.by psadri
7/5/2026 at 3:12:45 AM
Better a round trip than bad/incorrect results. Also, the cache should kick in so the cost will be minimal.by sdesol
7/5/2026 at 9:09:39 AM
So? What alternative do you suggest? Let the LLM get it wrong forever? Remove the tool? Automatically try to patch the syntax?Almost no "solutions" in engineering/programming comes for free, one way or another, it's all a balancing act between different solutions with different tradeoffs. In this case, another request/response seems preferable to the other tradeoffs.
by embedding-shape
7/5/2026 at 2:36:17 PM
The counter intuitive pattern I see emerging is if you can cleanly determine intent, of the call you fix the call and prepend informative text to the tool call response indicating the mistake made and how to fix in the future then followed by the actual tool call. In this case you can validate fields and rather than throw a hard error determine if it's an extra field that isn't needed. If so you correct the call and prepend a corrective response in the tool call. This saves turns, it instructs the model in context so less likely to happen later and helps models that aren't so good at recovering from bad tool calls and staying on their longer horizon agentic task (most non openai and anthropic models)by leemoore
7/5/2026 at 11:32:40 AM
Maybe I'm dumb but I believe I can think of one totally free solution here..by beepbooptheory
7/5/2026 at 9:49:55 AM
Pi already emits good errors messages; I always see Claude Opus 4.8 correct itself in its next attempt when it gets a tool call wrong.by euiq
7/5/2026 at 2:11:08 PM
So, is this part of the tool definition, or did you create your own coder agent?by Shorel
7/5/2026 at 4:49:25 AM
I've built a library that makes creating rich feedback systems easier, check this out:by klntsky
7/5/2026 at 8:16:16 AM
Okay, but I solved this with a print statement.by cadamsdotcom
7/5/2026 at 9:15:06 AM
My assumption was that it is often not convenient if you have a lot of logic. I used it internally, and the complexity of my use cases was barely enough to justify it too. But I've seen systems where it would definitely be a value unlock if I had to integrate LLM chatbots into themby klntsky
7/5/2026 at 12:47:07 PM
very cool.by try-working
7/5/2026 at 1:26:35 AM
So, are you saying that skills are not such a good tool for agents to learn, they still need tool-trial-and-error dance after injecting them? (I'm assuming each tool comes with its own skill.)by siwatanejo
7/5/2026 at 1:52:55 AM
> they still need tool-trial-and-error dance after injecting them?It honestly depends on the model. For my pi-brains extension for pi
https://github.com/gitsense/pi-brains
I've found after the first hook injection they get it, but there are occasions it can forget, but since everything is driven by hooks, you can inject as often as needed.
The issue with skills is, they are a one time thing, so you really can't use skills to correct haviorial issues.
by sdesol
7/5/2026 at 8:52:05 AM
Tools come with a tool description in json schema format, but yes your point stands, it is not enough for opus 4.8 which I've also noticed having tool call issues.by Bolwin
7/5/2026 at 1:40:12 AM
I do not need to waste tokens on skills, I use Claude Code hooks.Have a look at the TDD guard at https://codeleash.dev - the scripts/tdd_log.py arguments are pretty specific but it also has guidance in CLAUDE.md and lots of helpful error messages.
by cadamsdotcom
7/5/2026 at 4:37:31 AM
May I know when should skills be used over hooks and vice versa?by 8cvor6j844qw_d6
7/5/2026 at 6:41:28 AM
Hooks provide determinism.Hooks can run code.
Hook code can be written in advance by the agent, runs in milliseconds, costs zero tokens, and gives the same result everytime.
Agents live at the boundary of codification; anything codifiable should be codified rather than run through an unpredictable machine. Hence, use hooks when you want determinism & predictability & certainty.
Examples: your stop hook could run tests against the code that’s just been written. Now, if your agent docs also tell your agent that the stop hook will run tests and there’s no need to run tests itself, then it’ll trigger a stop when it’s done instead of running tests itself. Just be sure to change the exit code to 2 and route the test failure output to stderr so Claude Code will show that output to the agent. Because the stop hook will fail over and over until tests pass, you just created a very simple guard that guarantees tests pass before you see the code - your agent can’t stop working without passing tests!
by cadamsdotcom
7/5/2026 at 5:57:29 AM
Hooks are for doing AoP style wrapping of your interactions with the harness. Type /hook on the console see what is available. Have CC analyze your session and suggest converting part of your workflow to a hook, and then have it test it.by thx67
7/5/2026 at 5:54:30 AM
This maneuver requires you to anticipate all the edge cases or error messages beforehand which is practically not possible in many situations. The moment something unanticipated happens or the model changes its processing logic, the tool call system stops working just like any other deterministic program or tool.by pyeri
7/5/2026 at 7:21:48 AM
> This maneuver requires you to anticipate all the edge cases or error messages beforehand which is practically not possible in many situations. The moment something unanticipated happens or the model changes its processing logic, the tool call system stops working just like any other deterministic program or tool.Not all; error messages are part of UX design, and the user error message should always give an error that indicates what the user can do to fix the problem.
If you cannot open a file for writing, don't just return "error: cannot open MyFile.txt", return "MyFile.txt: permission denied" (so user can request additional permissions from whoever), "MyFile.txt: no space left on device" (so user can free up some space), "Myfile.txt: file exists and is a directory" (So user can retry with a different name, or remove the directory, etc).
I think what is happening now is that, with so many of the agent-using pool of devs having never shipped to end-users before, they are surprised that their "program" (the tool) is being used wrong by the end-user (the LLM).
Those of us with battle-scars already expect the user to use it wrong and have learned that it's easier to tell the user how to fix the problem than to ask the user to read the manual/do it the correct way.
by lelanthran
7/5/2026 at 10:40:20 AM
So much this. I tell my juniors: To a beginner programmer, errors are 'the end'. They feel they did their best, it is not their fault and that is the error message they print. Experienced programmers know the user struggle, for them an error message is 'a beginning'. The first step of the user striving to solve the problem. They gave that command and they did not give it to fail. They (the users) still want to teach their goal.Pro tip: Don't just print the return code, also print the call and it's arguments that failed, even without a stack trace.
by jeffreygoesto
7/5/2026 at 8:24:10 AM
Just add a --verbose flag that shows the stacktrace when there is an error. Then add a footer message when an error appears in non-verbose mode that invites the user/agent to use --verbose to get the full picture.It obviously may end up in thousands of tokens burned through though (you can also fix that adding different levels of verbosity), but hopefully errors are not common.
by mrbungie
7/5/2026 at 8:18:31 AM
Are we talking about the same thing?If the agent uses the to incorrectly, validation fails.
If validation fails for ANY reason, print a message saying “here’s how to use it correctly”.
You don’t need to anticipate every misuse, just validate your inputs.
by cadamsdotcom
7/5/2026 at 5:51:57 AM
same findings here, it'll doom loop without the proper error messaging. really expensive without error logging that gets propagated back to the agentby StrugglingDev
7/5/2026 at 1:18:42 AM
LSPs and linters serve the same purpose. I use the latter in git hooks.by esafak