5/7/2026 at 3:19:02 PM
I'm skeptical skills will outperform training given that Opus 4.7 already ignores a 720-byte CLAUDE.md telling it to use tidewave (a Rails MCP server with 6 tools) for db queries. When I asked a new claude session about a record it called> Bash(DATABASE_URL=$(grep -E '^DATABASE_URL=' .env 2>/dev/null | head -1) echo "ok")
even though I have in CLAUDE.md:
> For database queries, use tidewave first.
I then prompted:
> use tidewave as per CLAUDE.md. also diagnose why you failed to heed that
> ● Diagnosis first: I defaulted to shell habits (env grep → psql) instead of pausing to recall the CLAUDE.md rule that tidewave is the first-line DB tool. The trigger was "look at this record" — I should have read that as "run a SQL query" and reached for tidewave immediately.
If Opus 4.7 doesn't follow simple CLAUDE.md instructions, I'm not sure what benefits other markdown files could bring. I don't trust Opus's own explanation, but it could point to the fact that the system prompt for bash is much longer than CLAUDE.md with tidewave.
While LLM judging could be helpful, I think the tool-call assertions (https://github.com/darkrishabh/agent-skills-eval#what-you-ge...) may be the most useful thing in agent-skills-eval given that it's the only objective measure of compliance.
by reedlaw
5/7/2026 at 8:39:12 PM
I've had minor success with chiding the clanker, after it ignores something, to "please revise AGENTS.md to never do <whatever stupid thing it did> to prevent future assistances from doing x."So, atleast heuristically, it should know _why_ it ignored whatever and hopefully pulls the correct anti-matter context. It took about two reptitions of this to get it to use pg-promise instead of psql to do queries for me. I assume the longer the context goes on, the less likely any of priming works.
by cyanydeez
5/7/2026 at 10:18:57 PM
Your using Claude code. That's your problem.Use a different harness
by NamlchakKhandro
5/8/2026 at 12:44:18 PM
Codex is only slightly better, and that fluctuates so I switch back and forth.by reedlaw
5/7/2026 at 4:31:57 PM
Use a hookby erispoe
5/7/2026 at 5:11:48 PM
I tried to create a hook that would detect when token usage was running out and write HANDOFF.md so I could switch to another agent and finish the current task. It never worked reliably. To make a hook for db queries, it would need to run before each bash call, check if it looks like a query, and then exit with a new prompt, e.g.: "Use tidewave's execute_sql_query for DB access". But then it could just ignore the prompt the same as CLAUDE.me. What if I really wanted to use bash for a specific task? The real issue is that prompts are not tightly coupled with capabilities. If we admit that, then skills are over hyped.by reedlaw
5/7/2026 at 4:49:59 PM
It's hard to make hooks work here, since the default approach it's using is call the URL directly.I think it's better to have a repo-level skill instead, titled something like "connecting_to_db.md" and demonstrate exactly how to connect. Codex has been pretty good at referring to skills but it depends on context at the end of the day.
by rirze
5/7/2026 at 7:07:38 PM
[flagged]by darkrishabh