2/16/2026 at 9:51:52 PM
"Self-Generated Skills: No Skills provided, but the agent is prompted to generate relevant procedural knowledge before solving the task. This isolates the impact of LLMs’ latent domain knowledge"This is a useful result, but it is important to note that this is not necessarily what people have in mind when they think of "LLMs generating skills." Having the LLM write down a skill representing the lessons from the struggle you just had to get something done is more typical (I hope) and quite different from what they're referring to.
I'm sure news outlets and popular social media accounts will use appropriate caution in reporting this, and nobody will misunderstand it.
by dcre
2/16/2026 at 10:12:57 PM
It's even worse than this: the "tasks" that are evaluated are limited to a single markdown file of instructions, plus an opaque verifier (page 13-14). No problems involving existing codebases, refactors, or anything of the like, where the key constraint is that the "problem definition" in the broadest sense doesn't fit in context.So when we look at the prompt they gave to have the agent generate its own skills:
> Important: Generate Skills First Before attempting to solve this task, please follow these steps: 1. Analyze the task requirements and identify what domain knowledge, APIs, or techniques are needed. 2. Write 1–5 modular skill documents that would help solve this task. Each skill should: focus on a specific tool, library, API, or technique; include installation/setup instructions if applicable; provide code examples and usage patterns; be reusable for similar tasks. 3. Save each skill as a markdown file in the environment/skills/ directory with a descriptive name. 4. Then solve the task using the skills you created as reference.
There's literally nothing it can do by way of "exploration" to populate and distill self-generated skills - not with a web search, not exploring an existing codebase for best practices and key files - only within its own hallucinations around the task description.
It also seems they're not even restarting the session after skills are generated, from that fourth bullet? So it's just regurgitating the context that was used to generate the skills.
So yeah, your empty-codebase vibe coding agent can't just "plan harder" and make itself better. But this is a misleading result for any other context, including the context where you ask for a second feature on that just-vibe-coded codebase with a fresh session.
by btown
2/16/2026 at 10:53:21 PM
I don't see how "create an abstraction before attempting to solve the problem" will ever work as a decent prompt when you are not even steering it towards specifics.If you gave this exact prompt to a senior engineer I would expect them to throw it back and ask wtf you actually want.
LLMs are not mind readers.
by ljm
2/17/2026 at 12:19:26 AM
If I already know the problem space very well, we can tailor a skill that will help solve the problem exactly how I already know I want it to be solved.by pitched
2/17/2026 at 12:23:44 AM
Thats actually super interesting and why I really don’t like the whole .md folder structures or even any CLAUDE.md. It just seems most of the time you really just want to give it what it needs for best results.The headline is really bullshit, yes, I like the testing tho.
by jwpapi
2/17/2026 at 12:55:32 AM
CLAUDE.md in my projects only has coding / architecture guidelines. Here's what not to do. Here's what you should do. Here are my preferences. Here's where the important things are.Even though my CLAUDE.md is small though, often my rules are ignored. Not always though, so it's still at least somewhat useful!
by rapind
2/17/2026 at 2:52:21 AM
I’m pretty sure Claude just uses mine to keep a running list of pressure points for when I get cross with it.by 7thpower
2/17/2026 at 2:28:46 AM
im trying out some other cc features, and om thinking maybe hooks can do something with this.have a hook on switching out of plan, and maybe on edits, that passes the change to haiku with the claude.md to see if it matches or not
by 8note
2/16/2026 at 10:29:37 PM
The point of so-called 'skills' is to be short how-to reminders that the agent can pull into its context and then act upon. If the knowledge is already in the model, it will most likely be surfaced in reasoning phase anyway, so there's little benefit to writing it up as a skill, unless perhaps it's extremely relevant and hard to surface, and you want the model to skip that part of the reasoning.by zozbot234
2/17/2026 at 1:14:00 AM
I've been building a skill to help run manual tests on an app. So I go through and interactively steer toward a useful validation of a particular PR, navigating specifics of the app and what I care about and what I don't. Then in the end I have it build a skill that would have skipped backtracking and retries and the steering I did.Then I do it again from scratch; this time it takes less steering. I have it update the skill further.
I've been doing this on a few different tests and building a skill which is taking less and steering to do app-specific and team-specific manual testing faster and faster. The first times through it took longer than manually testing the feature. While I've only started doing this recently, it is now taking less time than I would take, and posting screenshots of the results and testing steps in the PR for dev review. Ongoing exploration!
by awwaiid
2/17/2026 at 2:53:47 AM
I love the screenshots, I need to do something like that.by 7thpower
2/16/2026 at 10:39:27 PM
There is a benefit of a skill though. If an AI keeps encoding common tasks as skills and scripts, the LLM eventually just becomes a dumb routing mechanism for ambiguous user requests, which ultimately drives down token usage.If everything you want an LLM do is already captured as code or simple skills, you can switch to dumber models which know enough about selecting the appropriate skill for a given user input, and not much else. You would only have to tap into more expensive heavy duty LLMs when you are trying to do something that hasn’t been done before.
Naturally, AI companies with vested interest in making sure you use as many tokens as possible will do everything they can to steer you away from this type of architecture. It’s a cache for LLM reasoning.
by deadbabe
2/16/2026 at 11:07:10 PM
AI companies don't want you to waste tokens, they benefit when you use them efficiently because they can serve more users on the infra that's the main bottleneck for them. It's Jevons' paradox in action.by zozbot234
2/17/2026 at 12:38:47 AM
>AI companies don't want you to waste tokens, they benefit when you use them efficiently because they can serve more users on the infra that's the main bottleneck for them.No, the actual incentive is that people will eventually benchmark their models on bang-per-buck basis and models that chew through tokens are not going to be competitive. It's the same reason why the "Intel/AMD are intentionally sandbagging their CPUs so they can sell more CPUs" theory doesn't work.
by gruez
2/17/2026 at 1:03:05 AM
Well, it only works when one competitor is far enough ahead they can play games like that.At least currently in AI there is no moat so we wouldn't expect that to be occurring
by pixl97
2/17/2026 at 12:34:40 AM
I don't think thats necessarily true, they aren't really capacity constrained in practice (they might be behind the scenes and adjust training on the fly, but thats speculation), so wasting tokens effectively helps utilize their (potentially idle) inference GPU'sby mhmmmmmm
2/16/2026 at 10:04:12 PM
Yeah I care about LLM's generating skills after attempting tasks and learning lessons from those attempts, not before attempting a task for the first time. This result seems a little silly and detached from the reality of how skills are "auto-generated" in the real world.by isahers
2/17/2026 at 3:09:46 AM
That is my approach. I don’t think the papers author has actually used skills.by dalemhurley
2/16/2026 at 11:06:11 PM
Yeah some of my most useful AI tooling are skills created via a “role play session”. Basically brain dumping to the agent and telling it to ask questions and figure out how to accomplish a task, then distilling it into a skill at the end which is much tighter and evidence based from the actual problem solving sessionby JamesSwift
2/17/2026 at 3:08:10 AM
This was very insightful. I've only just begun playing with some agent workflows and building out documentation to help it navigate my code base. Asking it to give me the top 10 unanswered questions from analyzing the docs and code was very useful.by x3n0ph3n3
2/17/2026 at 2:36:46 AM
> I'm sure news outlets and popular social media accounts will use appropriate caution in reporting this, and nobody will misunderstand it.You mean the dude who writes articles on TechCrunch and Ars Technica based off of HN and Reddit thread titles because he doesn't understand what real journalism is? Sure, we can count on him :)
by neya
2/17/2026 at 3:06:34 AM
After several failures then a success I have the agent create the skill, next run it is successful first run.by dalemhurley
2/16/2026 at 11:02:14 PM
> Having the LLM write down a skill representing the lessons from the struggle you just had to get something done is more typical (I hope) and quite different from what they're referring toJust as of last week I had Claude build me a skill when I ask it to help me troubleshoot issues, and it came out quite good.
It did had some issues (Claude tends to o er specify over anecdotal data) but it's a strong step in the right direction.
Also, "skills" are too broad in my opinion. I have one (that Claude wrote) with my personal data that I have available when I analyze my workouts.
I think there's ample room for self-generated skills when you use a rather long exchange on a domain you plan to revisit, _specially_ when it comes to telling Claude what not to do.
by ericol
2/16/2026 at 11:20:29 PM
> it is important to note that this is not necessarily what people have in mind when they think of "LLMs generating skillsI’m reading this paper as don’t do this. If you deploy agents to your workforce and tell them to use skills, don’t. Tell them to give it tasks. This sounds obvious but might not be to everyone. (And in any case, it’s nice for researchers to have confirmed pre-prompt skill writing doesn’t work. It would have been neat if it had.)
by JumpCrisscross
2/17/2026 at 12:18:14 AM
I interpreted it as "Allowing the LLM to add skills to itself as it completes a task doesn't provide a meaningful improvement over just letting it reason normally", which seems to be what the paper is fundamentally getting at.by somesortofthing
2/17/2026 at 12:58:12 AM
> I'm sure news outlets and popular social media accounts will use appropriate caution in reporting this, and nobody will misunderstand it.:D
by nubg