3/27/2026 at 8:04:25 AM
I’m curious whether anyone has measured this systematically. Right now most of the evidence for multi-agent setups still feels anecdotal.by yesensm
3/29/2026 at 2:04:56 PM
I ran a Claude Code pipeline across 98 Rails models, 9,000 tests total. The thing that made it repeatable: each agent can only do one job. The analyzer writes a YAML plan but no Ruby. The writer gets one slice of that plan, not the whole thing. A Ruby script (not the AI) checks the output for 138 known mistakes before anything moves forward. If the check fails, the agent has to fix it.That Ruby script is where the data comes from. I can see exactly which checks caught which problems across all 98 runs. That is not anecdotal, it is a log file.
by viktorianer
3/27/2026 at 11:27:40 AM
And expensive, exactly the way a pay per use product would push its customers…“It’s not working well enough!” We tell them. They respond with “Have you tried using it more?”
by not_ai
3/27/2026 at 1:31:50 PM
Back in 2024 I read a study saying: "Ask 4 LLMs the same question, if they all give you the same answer there is some 95-99% chance its correct"Soooo... Its not just greed. There is something there.
by 3yr-i-frew-up
3/27/2026 at 2:58:02 PM
Yes exactly. I’m talking about this in the article. I found out that when Claude and Codex both review the same PR and both find the same issue, our team fixes it 100% of the time.by axldelafosse
3/27/2026 at 4:36:57 PM
What's the point of pair programming then if they both have the same opinions?by zombot
3/27/2026 at 5:38:44 PM
They don't. And you would be surprised how a good model actually pushes back on some comments.The point was: when they do agree, it is a very strong signal.
by axldelafosse
3/27/2026 at 4:54:54 PM
There are a number of different models out there.by pixl97
3/27/2026 at 1:22:18 PM
Haha yeah... Wait until they start jacking up the subscription pricesby shafyy
3/27/2026 at 3:20:55 PM
They don't change the prices, they just modify the amount of compute allocated - slower speeds and fewer tokens, they can set everything in the background to optimize costs and returns, and the user never realizes anything has changed.Sometimes they'll announce the changes, and they'll even try to spin it as improving services or increasing value.
Local AI capabilities are improving at a rapid pace, at some point soon we'll have an RWKV or a 4B LLM that performs at a GPT-5 level, with reasoning and all the bells and whistles, and hopefully that'll shake out most of the deceptive and shady tactics the big platforms are using.
by observationist
3/28/2026 at 2:43:59 PM
> They don't change the prices, they just modify the amount of compute allocated - slower speeds and fewer tokens, they can set everything in the background to optimize costs and returns, and the user never realizes anything has changed.I can't imagine that this is the way it will go... Tokens haven't been getting cheaper for flagship models, have they? You already see something closer to their real cost if you compare e.g. the Claude subscriptions to their actual token pricing.
> Local AI capabilities are improving at a rapid pace, at some point soon we'll have an RWKV or a 4B LLM that performs at a GPT-5 level, with reasoning and all the bells and whistles, and hopefully that'll shake out most of the deceptive and shady tactics the big platforms are using.
Maybe, but LLMs are scale game, and data center will always be more capable than your local device. So, you will always be getting a worse version locally. Or do you think we'll LLMs in data centers stop getting better and local LLMs will somehow catch up?
by shafyy
3/27/2026 at 9:13:43 AM
Completely with you on this! But then we need to define the cirteria for comparison. Might not be that easy unfortunatelyby stackgrid