3/17/2026 at 9:33:52 PM
I was using this and superpowers but eventually, Plan mode became enough and I prefer to steer Claude Code myself. These frameworks are great for fire-and-forget tasks, especially when there is some research involved but they burn 10x more tokens, in my experience. I was always hitting the Max plan limits for no discernable benefit in the outcomes I was getting. But this will vary a lot depending on how people prefer to work.by gtirloni
3/18/2026 at 2:13:08 AM
I ended up grafting the brainstorm, design, and implementation planning skills from Superpowers onto a Ralph-based implementation layer that doesn't ask for my input once the implementation plan is complete. I have to run it in a Docker sandbox because of the dangerously set permissions but that is probably a good idea anyway.It's working, and I'm enjoying how productive it is, but it feels like a step on a journey rather than the actual destination. I'm looking forward to seeing where this journey ends up.
by marcus_holmes
3/18/2026 at 12:47:50 PM
I find simple Ralph loops with an implementer and a reviewer that repeat until everything passes review and unit tests is 90% of the job.I would love to do something more sophisticated but it's ironic that when I played both agents in this loop over the past few decades, the loop got faster and faster as computers got faster and faster. Now I'm back to waiting on agentic loops just like I used to wait for compilations on large code bases.
by LogicFailsMe
3/18/2026 at 4:34:28 PM
Curious what you mean by "played both agents" and "faster and faster"? API calls are API Calls or are you running an open-source model locally?by hatmanstack
3/18/2026 at 5:41:54 PM
Rephrasing of the post in case it's clearer:"I would love to do something more sophisticated, but it's ironic that when I performed both of the duties done nowadays by agents, the development loop got faster and faster as computers got faster and faster."
by gavinray
3/19/2026 at 6:51:22 PM
For context and curiosity, are you using local inference? Which models?by dotancohen
3/18/2026 at 10:05:00 AM
If it is working, why is it just a step on a journey? What is missing?by auggierose
3/18/2026 at 11:44:28 PM
It's a kludged-together dev process made up of two different systems in a docker container so potential damage is contained. It's not ideal ;)Neither of those two systems feel evolved either. Superpowers is very cool, but there are holes still. And Ralph feels like an experiment that worked so they published it.
This is all going somewhere, evolving and moving towards some beautiful system. Or maybe the usual dev ecosystem shit - it'll be a great prototype and then it'll get overthought, overcomplicated and overengineered and end up less usable than what we had before *glares at React*
by marcus_holmes
3/18/2026 at 12:41:53 PM
did you hand modify the superpowers skills or are you managing this some other way?by jghn
3/18/2026 at 4:38:49 PM
For me, I just created my own prompt pipeline, with a nod towards GANs all of the necessary permissions get surfaced so I don't need to babysit it, and all are relatively simple. No need for Yolo or Dangerously setting Permissions.by hatmanstack
3/18/2026 at 11:46:42 PM
yeah, I coped the skills I wanted into a directory, hacked away at them until they did what I wanted, and then added them to the dockerfile for the sandboxby marcus_holmes
3/17/2026 at 9:49:20 PM
I've gone the other way recently, shifting from pure plan mode to superpowers. I was reminded of it due to the announcement of the latest version.It is perhaps confirmation bias on my part but I've been finding it's doing a better job with similar problems than I was getting with base plan mode. I've been attributing this to its multiple layers of cross checks and self-reviews. Yes, I could do that by hand of course, but I find superpowers is automating what I was already trying to accomplish in this regard.
by jghn
3/17/2026 at 10:33:06 PM
Yes, it does help in that way. Maybe I'm still struggling to let go and let AI take the wheel from beginning to end but I enjoy the exploratory part of the whole process (investigating possible solutions, trying theories, doing little spikes, etc, all with CC's assistance). When it's time to actually code, I just let it do its own thing mostly unsupervised. I do spend quite a lot of time on spec writing.by gtirloni
3/17/2026 at 10:35:54 PM
That’s part of what I’ve liked about it over plan mode. Again not a scientific measurement but I feel it’s better at interactive brainstorming and researching the big picture with me. And it’s built in multiple checkpoints also give me more space to pivot or course correct.by jghn
3/18/2026 at 3:53:09 AM
Just tried GSD and Plan Mode on the same exact task (prompt in an MD file). Plan Mode had a plan and then base implementation in twenty minutes. GSD ran for hours to achieve the same thing.I reviewed the code from both and the GSD code was definitely written with the rest of the project and possibilities in mind, while the Claude Plan was just enough for the MVP.
I can see both having their pros and cons depending on your workflow and size of the task.
by healsdata
3/18/2026 at 3:04:56 AM
I use GitHub Copilot and unfortunately there has been a weird regression in the bundled Plan mode. It suddenly, when they added the new plan memory, started getting both VERY verbose in the plan output and also vague in the details. It's adding a lot of step that are like "design" and "figure out" and railroads you into implementation without asking follow-up questions.by Rapzid
3/18/2026 at 4:13:23 AM
I find that even with opus 4.6, copilot feels like it’s handicapped. I’m not sure if it’s related to memory or what but if I give two tasks to opus4.6 one in CC and one in Copilot, CC is substantially better.I’ve been really enjoying Codex CLI recently though. It seems to do just as well as Opus 4.6, but using the standard GPT 5.4
by whalesalad
3/18/2026 at 11:38:50 AM
I have the same experience with Antigravity and Gemini CLI, both using Gemini 3 Pro. CLI works on the problem with more effort and time. Meanwhile, antigravity writes shitty python scripts for a few seconds and calls it a day. The agent harness matters a lotby chaostheory
3/18/2026 at 1:59:18 PM
Copilot feels like being a caveman, Claude code feels like modern times comparatively.by Atotalnoob
3/18/2026 at 3:07:14 PM
I think this shows that the model alone isn't the complete story and that these "harnesses" (as people seem to be calling them) shape a lot of the experienced behavior of these tools.by gtirloni
3/19/2026 at 9:15:21 PM
My analogy is that the model is the engine and the harness is the driver and chassis.You can have the biggest monster of an engine ever, but if you put it in a tricycle and a grandma is driving, you won't get good results.
by theshrike79
3/18/2026 at 7:53:43 PM
Opus 4.6 has a 200k context limit in Copilot. Could be the issue.by codebolt
3/18/2026 at 6:28:32 AM
As a matter of interest are you using the copilot cli?by nfg
3/18/2026 at 1:37:18 PM
yeah. copilot cli using opus 4.6 vs claude code using opus 4.6by whalesalad
3/18/2026 at 8:50:36 PM
If you could share I’d be really interested in hearing a concrete example of the two behaving differently. I work in Microsoft (not on copilot - though I’m an heavy user, and use Claude code in a personal capacity) and would be quite happy to repro and report back to the copilot cli team who are responsive.by nfg
3/18/2026 at 7:39:52 AM
> VERY verbose in the plan outputIs that an issue? GitHub charges per-request, not per-token, so a verbose output and short output will be the same cost
What model are you using?
by NSPG911
3/18/2026 at 4:25:57 PM
The problem might be that our brains charge per token, which makes reviewing hard. :)by jounker
3/18/2026 at 4:11:25 AM
Same experience. Superpowers are a little too overzealous at times. For coding especially I don’t like seeing a comprehensive design spec written (good) and then turning that into effectively the same doc but macro expanded to become a complete implementation with the literal code for the entire thing in a second doc (bad). Even for trivial changes I’d end up with a good and succinct -design.md, then an -implementation.md, then end with a swarm of sub agents getting into races while more or less just grabbing a block from the implementation file and writing it.A mess. I still enjoy superpowers brainstorming but will pull the chute towards the end and then deliver myself.
by whalesalad
3/18/2026 at 3:16:15 PM
Yes. I sometimes had to specifically ask it to NOT add any code to the specs because that would be done at a later stage.by gtirloni
3/18/2026 at 2:27:52 PM
Yup yup yup. I burned literally a weeks worth of the 20$ claude subscription and then 20$ worth of API credits on gsdv2. To get like 500 LOC.And that was AFTER literally burning a weeks worth of codex and Claude 20$ plans and 50$ API credits and getting completely bumfucked - AI was faking out tests etc.
I had better experiences just guiding the thing myself. It definitely was not a set and forget experience (6 hours of constant monitoring) but I was able to get a full research MVP that informed the next iteration with only 75% of a codex weekly plan.
by sigbottle
3/18/2026 at 3:18:15 PM
You spent $25 on 500 LOC?by FromTheFirstIn
3/18/2026 at 5:30:04 PM
Well, there were milestones and docs and extra scaffolding that the gsd system produces, but yes. and it didn't seem like progress was going to go any faster.by sigbottle
3/18/2026 at 1:06:25 AM
I've played around a bit with the plugins and as you've said, plan mode really handles things fine for the most part. I've got various workflows I run through in Claude and I've found having CC create custom skills/agents created for them gets me 80% of the way there. It's also nice that letting the Claude file refer to them rather than trying to define entire workflows within it goes a long way. It'll still forget things here and there, leading to wasted tokens as it realizes it's being dumb and corrects itself, but nothing too crazy. At least, it's more than enough to let me continue using it naturally rather than memorizing a million slash commands to manually evoke.by SayThatSh
3/18/2026 at 1:26:05 AM
I have been using superpowers for Gryph development for a while. Love the brainstorming and exploration that it brings in. Haven’t really compared token usage but something in my bucket.by abhisek
3/18/2026 at 6:44:52 AM
> I was using this and superpowers but eventually, Plan mode became enough and I prefer to steer Claude Code myself.Plan mode is great, but to me that's just prompting your LLM agent of choice to generate an ad-hoc, imprecise, and incomplete spec.
The downside of specs is that they can consume a lot of context window with things that are not needed for the task. When that is a concern, passing the spec to plan mode tends to mitigate the issue.
by locknitpicker
3/17/2026 at 10:12:17 PM
Why are we using cli wrappers if you're using Claude Code? I get if you need something like Codex but they released sub agents today so maybe not even that, but it's an unnecessary wrapper for Claude Code.by hatmanstack
3/18/2026 at 12:52:30 AM
Wrappers are useful for some tasks. I use ralph loops for things that are extremely complicated and take days of work. Like reverse engineering projects or large scale migration efforts.by odie5533
3/18/2026 at 1:07:25 AM
Even with the 1 mil context windows? Can't you just keep the orchestrator going and run sub agents? Maybe the added space is too new? I also haven't tested out the context rot from 300K and up. Would love some color on it from first hand exp.by hatmanstack
3/18/2026 at 1:24:55 AM
It's not a context issue so much as a focus issue. The agent will complete part of a task and then ask if I want it to continue. Even if I told it I want it to keep going until all tasks are complete. Using a wrapper deals with that behavior.Most projects I do take 20 minutes or less for an agent to complete and those don't need a wrapper. But for longer tasks, like hours or days, it gets distracted.
by odie5533
3/18/2026 at 2:00:29 PM
Damn, what kind of tasks are you making your agents work on that takes days???by mrhaugan
3/19/2026 at 9:10:30 AM
Claude Code has been working 24/7 for the past 4 days on creating a private server for a dead video game. It managed to get login, chat, inventory, and a few other features working. I provided it tools like Ghidra and x64dbg and pywinauto. Progress is slow but incremental. Each day new bits work that didn't before.by odie5533
3/18/2026 at 2:52:11 AM
So that you can have a fresh context for every little thing. These harnesses basically marry LLMs with deterministic software logic. The harness programmatically generates the prompts and stores the output, step by step.You never want the LLM to do anything that deterministic software does better, because it inflates the context and is not guaranteed to be done accurately. This includes things like tracking progress, figuring out dependency ordering, etc.
by roncesvalles
3/17/2026 at 10:39:26 PM
GSD and superpowers aren't CLI wrappers?by gtirloni
3/17/2026 at 10:44:27 PM
It's a cli wrapper. Don't know how you could say it wasn't.edit: GSD is a cli wrapper, Superpowers not so much. Both are over-engineered for an easy problem IMHO.
by hatmanstack
3/17/2026 at 11:10:37 PM
Both are dramatically over-engineered. & That's okay. I find them to be products of an industry reconciling how to really work with AI as well as optimize workflows around it. Similar to Gastown et al.Otherwise, if you can own your own thinking, orchestrating, and steering of agents, you're in a more mature place.
by ramoz
3/18/2026 at 12:20:00 AM
I also see it as fleeting as right when you have it figured out, a new model will work differently and may/may not need all their engineering layers.by mycall
3/17/2026 at 11:40:31 PM
I think that's fair, if they were created today I'm sure the creators would make different decisions, a penalty of getting there first.by hatmanstack
3/17/2026 at 11:35:56 PM
No it's not. It's using Skills and Agents and runs always inside of Claude Code, Gemini CLI etc...by hermanzegerman
3/18/2026 at 3:01:33 AM
GSD delegates a lot of the deterministic work to a JavaScript CLI. That might be what the poster is talking about.by swingboy
3/18/2026 at 3:13:24 PM
That's definitely not a CLI wrapper. But people are calling Claude Code (clearly a TUI) a CLI so :shrug:GSD is a collection of skills, commands, MCPs(?), helper scripts, etc that you use inside Claude Code (and others). If anything, Claude Code is the wrapper around those things and not the other way around.
Re: helper scripts. Anyone doing extensive work in any AI-assisted platform has experienced the situation where the agent wants to update 10k files individually and it takes ages. CC is often smart enought to code a quick Python script for those changes and the GSD helper scripts help in the same way. It's just trying to save tokens. Hardly a wrapper around Claude Code.
by gtirloni
3/18/2026 at 1:22:59 AM
What's happening with the other 90%?by andai