6/8/2026 at 4:29:22 PM
Fast AI seems genuinely exciting and somewhat unsettling to me. Right now Claude is faster than me on some tasks but we’re at least close. I have a prompt to clean up a PR that’s been running for 1h now and I expect it to take another few. It’s hard to imagine how the workflow would look like if it was near-instant. On the one hand, it might be easier to focus. Some prompts take so long that I start to multitask and regret it later. On the other, AI that takes a few seconds to max few minutes to solve what used to take hours or days? That’s a game changer and I don’t even know where we fit in.by goyozi
6/8/2026 at 4:43:53 PM
I'm using Deepseek-v4-pro as my main model and this is sometimes pretty annoying, I have to do some easy boring task, think "I'll just leave the agent to do it and go take a nap", but it's already done writing the code before I even walk away from the computerby flexagoon
6/8/2026 at 7:18:13 PM
DeepSeek is the fastest model in the benchmarks I've been doing (https://swelljoe.com/post/will-it-mythos/). Followed not so closely by Opus 4.8 and even less closely by Gemini 3.5 Flash and GPT 5.5. I've been really impressed with it, so far. It's also among the best at doing the work, though still trailing the frontier models from Anthropic and OpenAI.by SwellJoe
6/9/2026 at 8:20:43 AM
Nice benchmark, thanks! Which quants did you choose for the self hosted models?by anschl
6/9/2026 at 8:50:06 AM
8-bit on that one (unsloth 8_K_XL). But, the next post compares all common quantizations of Qwen 3.6.I have another coming in a day or so for Gemma 4 with the 4-bit QAT version, which is very surprising (in a good way, Gemma 4 is impressive for this task).
by SwellJoe
6/8/2026 at 4:49:44 PM
Do you mean Flash and not Pro? I haven't tried it personally, but according to OpenRouter, the fastest DeekSeep V4 Pro providers are only ~50tps. That's slower than Claude Opus.https://openrouter.ai/deepseek/deepseek-v4-pro?sort=throughp...
by RussianCow
6/8/2026 at 7:23:14 PM
In recent benchmarking I've been doing, DeepSeek V4 Pro was the fastest of 21 models, by a comfortable margin (https://swelljoe.com/html/bench-report-final.html). Faster than Claude Opus 4.8, which was the second fastest (Mistral doesn't count because it seems to have refused to participate). But, it's a limited data set, just a few benchmark runs of a limited set of tasks. It's entirely possible I happened to be calling the API at its least busy time and maybe Claude got hit during a busy time.by SwellJoe
6/8/2026 at 5:25:51 PM
I don't think token speed matters as much when a lot of tokens are needed to achieve a task. E.g. artificial analysis benchmarks where deepseek v4 is one of the biggest token burners to go through the benchmark.by sarjann
6/9/2026 at 2:53:32 AM
Both matter.by brianwawok
6/8/2026 at 8:13:15 PM
No, I mean Pro. I use it through OpenCode Go so I don't know what provider it uses under the hood, but it's very fast in my experience.by flexagoon
6/9/2026 at 6:50:44 AM
DS through OpenRouter is significantly slower than direct from DS platform in my experienceby thecopy
6/8/2026 at 4:52:43 PM
Yeah, flash is crazy fast, but I've found performance variable.by specproc
6/8/2026 at 6:53:35 PM
Flash is amazing if you know the domain really well.E.g. occasionally it makes the dumbest mistakes you've ever seen and can't correct them. However it's fairly rare, and if you know the domain really well, occasionally popping in the code and pushing it towards the correct solution takes like 20seconds or whatever.
So the speed you can move with flash + high domain knowledge beats opus by a mile in my experience.
I tried to switch back to 4.8 for a bit when it came out, feels so bad waiting 20mins for a mediocre solution when I could have had everything complete - with multiple iteration cycles - in flash in like 3-5mins.
by binary0010
6/9/2026 at 4:04:07 AM
Yes, you don't need much domain knowledge to use Opus, but it's just way too expensive.by addozhang
6/9/2026 at 8:52:29 AM
For losers who can't put together a program to save their life, have no real skills and were always not really interested in programming (hence their poor skills), renting a robot buddy to do it for them is a good deal, until the buddy cuts in materially into their salary, and until their bosses realize that they really just have robot operators on staff instead of people who can actually do things.by 59nadir
6/9/2026 at 5:56:46 PM
It's nice when I want to be lazy though.Or when I'm working two contract gigs. I can spec things out for one and turn it loose and trust it. Then work more closely with deepseek on the other project.
by Induane
6/8/2026 at 7:04:08 PM
[flagged]by flowbarai
6/8/2026 at 5:28:09 PM
Agent mania setting inIt's also pretty funny sometimes how it gives weird future roadmap estimates ("part 2 - 3 weeks, part 3 - 2 months", etc.) and when you tell it to actually do those changes it's pretty much done in half an hour
by throwaway67678
6/8/2026 at 5:51:43 PM
I've long believed those numbers were faked by Anthropic/OpenAI to serve as a form of advertisement. The estimates are impossible to verify and their ability to do "2 days of work" in 10 minutes will presumably make the user go "Wow, I just saved SO much time!" Plus, the unnecessary text eats up the users' tokens so it helps the companies on the backend, as well.by smith7018
6/8/2026 at 9:53:45 PM
I tend to be cynical about AI companies, but I'm guessing the bad estimates more just come from a complete lack of actual data it could use for that so it's more or less a hallucination.by overgard
6/8/2026 at 6:06:21 PM
I agree with you that labs are benefiting from those outputs but I'm skeptical that labs are purposefully training the models to produce those outputs.Raw pre-training data includes plenty of conversations between professional builders and some of those include estimates.
I believe the outputs are a training coincidence with consequences that are opportunitistic for the labs.
by leodavi
6/8/2026 at 6:15:44 PM
All the models have broken estimates. They're trained heavily on jira and GitHub tasks and issues, that's why their estimates are human.by AgentMasterRace
6/8/2026 at 7:29:07 PM
Even for humans the estimates are way off, unless it's based on data that has some serious padding.That said, it'll often say "2 days of work" and then complete the coding in 30 minutes, and while that's amusing, afterwards, I'll need to manually test, or send to other people for review, or realize the agent only actually did half the work and I need to do a second pass (or a third etc.) and then often getting the feature in does genuinely take two days.
by esperent
6/8/2026 at 6:54:35 PM
> the estimatesIt doesn't estimate.
It generates tokens that read like estimates associated with the context in its training material.
What would you expect the generator to output instead?
by Terretta
6/8/2026 at 8:29:53 PM
It generates tokens by estimating what the next token is going to be.Sure it cannot think like a human, but given it's input, it should give a good statistical answer (approximating not of how long it actually takes, but what a human would say how long it takes).
by legulere
6/9/2026 at 12:49:49 AM
The funny thing about this comment is that neural networks are universal function approximators.The most fundamental essence of what they do is exactly what you say they don't: estimate.
by mediaman
6/9/2026 at 1:38:07 AM
Funny and ironic in a way, but the point still stands that they do not actually estimate the time it will take.by airstrike
6/9/2026 at 3:28:25 AM
> they do not actually estimate the time it will takeYou can't prove that )))
by greenavocado
6/9/2026 at 4:31:08 AM
Right, but extraordinary claims require...by airstrike
6/9/2026 at 2:55:55 PM
Instructions unclear, hard drive reformat completed.by greenavocado
6/8/2026 at 10:24:50 PM
Obviously there isn't a hidden corpus of logs of coding chatbot assistants that has been accumulating over the years, but these coding chatbot assistants output tokens that resemble how we all imagined a coding chatbot assistant would have operated had it existed in the first place to end up in a corpus. "Training material" includes supervised fine-tuning, preference training, RLHF, and so on, so that certain outputs (like these timeline estimates) may really have been decided (at some level of conscious awareness) by product teams.by incr_me
6/8/2026 at 8:07:03 PM
you might like the stuff in my work of oh my pi, its a test bed for my ideas around making these tools more reliable. hoping to maybe have a native ui iter of the real thing that this is a test bed for this summer.https://github.com/cartazio/oh-punkin-pi/blob/main/scripts/b...
by carterschonwald
6/9/2026 at 2:19:34 AM
Therein lies the rub, no? To accurately predict the next token produced by a process, it’s necessary to model that process. If the process is a human attempting to estimate the duration of a task, then in some sense the LLM is modeling the estimation process. We’re well past the point where it’s credible to claim that LLMs just regurgitate their training data.by taneq
6/9/2026 at 12:06:11 AM
This is so 2023. The thought process.At that time the predominant view was that LLMs were nothing but stochastic parrots, that they would plateau, and that hallucinations couldn't be fixed.
At this point I doubt there are any AI sceptics left. That ship has long sailed. The only thing that matters is whether the estimates are accurate, and AI can improve on that too.
Even humans only estimate based on neurons firing in prior patterns.
by InterviewFrog
6/9/2026 at 4:26:49 PM
[dead]by monkpit
6/8/2026 at 11:44:04 PM
Actually in this case they possibly are estimates.It's been known for some years[1] that LLMs do regression in-context. Frontier models have been trained against many, many issue text that include task break downs and estimates.
by nl
6/9/2026 at 1:33:02 AM
Interesting. So it may have learned how to estimate as a human but doesn’t understand that it doesn’t operate at that speed :DI wonder if there’s a reasonable way to give an llm parameters that give it a concept of its own execution speed. Seems that could be useful for multiple purposes
by kube-system
6/9/2026 at 6:12:18 AM
Yes, it's entirely possible to do that via RL. It'd be a fun little project you could do for less than $100 on a small LLM actually.by nl
6/8/2026 at 7:39:59 PM
I think people are continuing to view these systems as pure LLMs - when that ship sailed 6+ months ago. Between being able to review memory, using agent harnesses and sub agents and skills to go out and discover information - modern systems (Codex, Claude Code, Cursor) - use LLMs - but the LLM is only a small component of it. Compare what you get from sending a request to a chatbot like ChatGPT - to what you can from a modern harness. The output is influenced by the LLM, but it's no longer a "model making a token prediction based on training material and RLHF" - that's a very 2025 way of looking at these systems.Even Gary Marcus is starting to come around and realize that his priors are no longer as relevant as they once were.
by ghshephard
6/8/2026 at 8:17:14 PM
No one is bitter lesson pilled anymore. Everyone is pivoting to neurosymbolic systems. It looks like Gary Marcus was right.by irthomasthomas
6/8/2026 at 11:49:05 PM
> No one is bitter lesson pilled anymore.Will the 10T parameter Mythos model be released this month or next month?
They better soon because it is generally accepted that one of the reasons GPT 5.5 is better at hard tasks than Opus is because of its parameter size - and that Opus 4.8 remains competitive only be scaling test-time compute (see how many more tokens it uses than GPT 5.5)
https://www.reddit.com/r/LLM/comments/1sz8bjz/parameter_esti...
by nl
6/9/2026 at 8:49:37 AM
Why ask me? Anyway, Mythos is not 10T. Anthropic confirmed the training run was under 10^26 flops. You can't train 10T to chincilla and stay under 10^26.Anthropic also confirmed they will not release Mythos, only a "Mythos-class" model, whatever that means.
by irthomasthomas
6/9/2026 at 11:23:07 AM
> Anthropic confirmed the training run was under 10^26 flops. You can't train 10T to chincilla and stay under 10^26.I don't think Anthropic have said anything of the sort.
Microsoft published it as 6.1*10^27 FLOPs[1]
Elon has claimed the are also training a 10T model because "Some catching up to do"[2]
by nl
6/9/2026 at 2:13:17 PM
I must have confused mythos with opus 4.7. One of their recent model cards confirmed that training flops was under the EO reporting requirement of 10^26 flops.by irthomasthomas
6/9/2026 at 12:06:50 AM
How is neurosymbolic not aligned with the bitter lesson? The bitter lesson is completely agnostic to architecture.by wild_egg
6/9/2026 at 8:40:35 AM
I should have stressed the symbolic part. Everyone has pivoted to symbolic systems like claude code and codex. They would no invest so heavily in such systems if they thought llms would deliver agi soon.by irthomasthomas
6/9/2026 at 3:20:29 PM
That's not what symbolic means.by jubilanti
6/8/2026 at 7:55:20 PM
You think someone is, or even should, special case things like estimates? What else deserves that level of intervention so they look less dumb?Logistics for getting to the car wash next door?
In the mean time, alas, no, we can see from actual prompts sent directly or through sub-agents, and actual replies, estimates remain LLM generated.
Though, this discussion here could change that, because indeed there is a lot of special casing and context stuffing going on, one of the oldest being today's date for example.
• • •
I did read the Claude Code leak, and use pi, etc. So I disagree with your premise rather strongly. Today's "systems" remain, roughly, piles of markdown and context engineering wrapped in UI affordances, and behave very similarly today to how they did in 2024 for those already engineering context and delegating.
by Terretta
6/8/2026 at 10:22:03 PM
I do a lot of code bisecting with Claude Code - and it spends hours running experiments - looking at experiment results, making guesses as to what to try next for an experiment - until it eventually comes around to a working code pattern. I mean - maybe this is as much a reflection on me as anything else - but it's pattern of logic isn't that much different from what I would do. It knows, in general, what tools and APIs it can call - it tries something - observes the result, and then comes back and tries different experiments based on success/failure - mostly efficiently bisecting to a solution.I'm still lower-down of the capability scale - as I'm still manually directing agents to do these wiggins loops - obviously the next step up is to direct the code-loops which control the agents. I just haven't got my tooling nailed in place to the point where I find that's more productive.
I actually might agree with you that this is mostly just "next token prediction" - if I can concede that's really all I do as well.
by ghshephard
6/9/2026 at 12:52:50 AM
> I actually might agree with you that this is mostly just "next token prediction" - if I can concede that's really all I do as well.Yep. Pretty sure I've got an LLM inside too.
The other replies complaining that my thinking is so 2023 -- on the contrary, what's evolved is my own apprehension of how LLM-like most "responses" from humans prove as well.
To be sure, there are other mechanisms at play as well, significant differentiation in our... Volume of training material? Quantizations/compression? Model architecture? Just-ahead-of-time forward branching with back propagation? Double loop adaptive learning? You know, harnessing the LLM. :-) Dare we call it executive function?
LLM mode becomes particularly apparent when conversing with Alzheimer's patients in the stage where short term memories do not form but they retain access to long term memory up to, say, 5 years ago or so. Fifty years of who they are, and one can trigger nearly identical responses with nearly identical prompts.
But that same person may be able to debate 1950s politics while being unable to complete making a sandwich.
If they didn't know of new shortcuts for a task, would almost certainly not "estimate" but "intuit", or "instictively" respond (apply heuristics), largely based on their "priors" aka training material.
If you sit with them and chat a while, you'll even get the kind of looping you get from Qwen trying to think when context is too full.
And if we believe this at all, then ... we should stop scrolling tik tok. Time to read a book. Have an experience. Fine tune. :-)
by Terretta
6/8/2026 at 11:09:37 PM
rather than special casing, make real data based on chat logs for how long things took both in calendar and chat timeby 8note
6/8/2026 at 6:26:14 PM
All models do it. It's their training. They didn't have "a person does this in a week but an LLM could in a minute" in their training yet. They also don't have the concept of elapsed time unless you ask them how long something has taken.by dizhn
6/8/2026 at 9:10:43 PM
Nah it’s all from the pretraining databy Narciss
6/9/2026 at 4:25:00 AM
That’s right up there with Scotty in the classic Star Trek always multiplying time estimates by 4 so he looks like a “miracle worker”by BobbyTables2
6/8/2026 at 8:56:30 PM
I mean in general I'd rather take slightly inflated estimates than the odd sprint poker stuff where other devs and PMs negotiate hours down and before you know it you're also stuck fixing nitpicky reviewer comments on code that is already good enough and have to send a release at like 7 PM, ofc also without enough tests or even enough manual checks and testing, cause people repeatedly act against their self-interest and try to compress timelines, thinking that that's somehow good for them.At least with AI that actually does things more quickly, there is a bit more breathing room (introducing AI is easier than changing a given environment).
Aside from that, I wonder how much variety there is in practice: between "Oh yeah, I added that new button while we were in the meeting" and "The new button feature will be ready in Q3 according to the roadmap, once we have sign-off from all the stakeholders."
by KronisLV
6/8/2026 at 11:57:38 PM
I heard an anecdote. Guy spent several days trying to convince his AI agent to build a feature. Kept saying it was crazy complicated, would take weeks.Finally he convinced it to try. It one shotted it in 30 seconds.
Turns out the agents' idea of what is hard and easy also comes from Common Crawl.
by andai
6/9/2026 at 12:04:25 AM
Why on earth would you spend any time at all convincing an agent of anything? You say "just do it" and off it goes.by wild_egg
6/9/2026 at 12:17:51 AM
Ya, but “doit” is 2x more efficientby dr_dshiv
6/9/2026 at 3:00:25 AM
Uh Claude tries real hard to dodge work. Talks about how it’s really hard 10 PRs. Finally convince it to do as 1. It stops 10% through and says ok done with PR 1, we can work on the last 9 tomorrow. Ugh.by brianwawok
6/9/2026 at 4:31:50 PM
Maybe we shouldn't have AI mimic humans too closely?by handfuloflight
6/9/2026 at 3:20:18 PM
You need to assert dominance.by g8oz
6/8/2026 at 6:38:58 PM
It repeats what it has seen in the training data. Expecting it to reason about the complexity of a task is a pipe dream. The best is to tell it not to come back with estimates, and when it does, remove them anyway.by throw1234567891
6/9/2026 at 3:42:27 AM
I added "you can do anything, believe in yourself" to system prompt, and task completion increased significantly.by andai
6/9/2026 at 3:50:25 PM
Well how else could I keep my reputation as a miracle worker Captain?by jimbokun
6/9/2026 at 7:49:33 AM
> It's also pretty funny sometimes how it gives weird future roadmap estimates ("part 2 - 3 weeks, part 3 - 2 months", etc.)those estimates are based on previous human estimates (the datasets it's been trained on).
unironically, when your comments will become part of a dataset, LLMs will likely get much better at estimating.
now that i think about it, all these writings about LLMs will give LLMs something much like meta-cognition.
by znpy
6/8/2026 at 6:48:49 PM
I exclusively use deepseek v4 flash now, completely stopped using slow models like Claude.Basically I never have to wait - yes I have to tell it little corrections occasionally (but I know the domain really well so that's not an issue), but it's so much faster than anything else it's kinda crazy. I love the super fast speeds with high involvement development cycle.
I actually enjoy using agentic development flows for the first time now - whereas with Claude I absolutely hated it. That 5 to 20 min wait after every prompt absolutely killed my desire to even want to work at all.
by binary0010
6/9/2026 at 1:32:05 PM
Take the nap anyway, just say it took all afternoon :)by abustamam
6/8/2026 at 10:23:23 PM
FWIW, for me just today it got itself into silly rabbit holes twice, and both times I had to fix things myself. Scarily, this is something I catch myself doing as well.by throw-the-towel
6/8/2026 at 4:53:19 PM
This reminds me of the Peter / Boris comments on writing loops to keep the agents busy.by tmaly
6/8/2026 at 11:56:48 PM
With Flash it's basically instant for smaller tasks, yeah.by andai
6/9/2026 at 7:47:24 AM
> I have to do some easy boring task, think "I'll just leave the agent to do it and go take a nap", but it's already done writing the code before I even walk away from the computerthe way software engineering works these days reminds me a lot of factory workers on production lines that just sit in front of a production line all day and take out faulty items and/or perform a single step in the production of goods.
by znpy
6/8/2026 at 6:19:03 PM
Same. How can DeepSeek serve the V4-Pro at such high speeds despite the sanction?by behnamoh
6/8/2026 at 8:22:32 PM
The sanctions only “prevent” them from directly buying NVidia’s latest and greatest in the sense that NVidia can’t sell directly to them. Essentially, there are companies now who are in a country without the sanctions, they buy from NVidia (or a partner), and then ship them off to China. For the orgs in China doing this, there’s zero legal risk besides having foreign customs service intercept the shipment and losing the goods. For NVidia there is zero incentive to care, as long as they look like they do, because sales are sales. You can bet Jensen ain’t losing sleep over it.GamersNexus had a really good investigative piece (~3hrs long) on this where they went to China and met with grey market sellers. That piece absolutely pissed off NVidia and resulted in a fight with Bloomberg too.
Deepseek may be also be running inference on oodles of Chinese hardware but it wouldn’t surprise me for a second if they just acquired Blackwell chips through the grey market. The original Deepseek models were all trained using NVidia chips if I remember right.
by rubyn00bie
6/9/2026 at 12:39:23 AM
That wouldn't explain why Deepseek is fast relative to other Chinese providers, especially considering that they're reportedly ahead of the curve among Chinese companies in moving off Nvidia. I think their quant fund background has more to do with it. Their models are clearly designed with performant inference clearly in mind.by seewhydee
6/9/2026 at 5:56:21 AM
Yes, it's performant, and esp performant at non-trivial context depths. DeepSeek-V4 DS4 (and Flash - DS4F) drop tok/s speed much less than the rest. On my M2 Max it took context depths of 768K to drop tok/s to ~10 tok/s.https://x.com/ljupc0/status/2062457314414587996
Other local models I've checked drop to unusable speeds way sooner. Only other model with similarity favourable curve I've tried is nemotron-cascade-2-30b-a3b. But it's a small model, way dumber than DS4F.
Coding agents use cases have large context depths. The rate of decline is as important as the headline number.
by ljosifov
6/8/2026 at 7:12:38 PM
Now the next bottleneck is the compiler - which we can model in an LLM! It's only wrong 15% of the time :)But truly, using Cerebras at ~2k tokens/s, with very low latency is like a vision into the future. You start to rework your workflow around things that can happen without onerous manual review - stating the conditions for success, etc. It's rare that I have a problem that maps well to that, but I expect this is where things are headed.
Of course the fast models tend to not be the SOTA ones, but if that was the case - high quality and near-instant thinking, that's a game changer that I don't think we're really prepared for. The things that get unlocked with higher-than-reasonable speed become very interesting.
by switchbak
6/9/2026 at 5:59:49 AM
Have you tried https://chatjimmy.ai/ it’s only a demo but it blew my mind. I had the sudden feeling that this is the future.by lhoff
6/9/2026 at 7:59:37 AM
What do you mean "demo"? Seems to work... Who is behind this?by colordrops
6/9/2026 at 2:03:47 PM
These guys: https://taalas.com/products/by alfiopuglisi
6/8/2026 at 6:10:34 PM
If we get low enough latency, there's no reason to multitask. You can ask it to do one thing at a time and immediately see what it did. That's a nice way to work!This is normal interactive UI for tasks that aren't compute-intensive. Programs spend most of their time idle, waiting for us to click a button. We shouldn't be waiting for them or spinning more plates to keep them busy.
However, a faster llm isn't enough. You also need fast compiles and fast tests.
by skybrian
6/8/2026 at 10:22:55 PM
I’ve been playing around with groq and GPT OSS which they run at 1000 TPS (20B) or 800 TPS (120B) and the speed feels quite magical.I haven’t tried cerebras’ 3000 TPS yet but I did try the demo of that 15,000 TPS model whose name escapes me right now.
I’m not sure if it makes a meaningful difference for my actual work, but it sure is amazing to watch it generate a screen full of text in the blink of an eye.
I do think it’s super useful for rubbing little validation checks like showing it a diff to ensure that the changes are on task, and being able to do those quicker really helps because it means you can do many focused checks without them getting in the way.
by dkersten
6/8/2026 at 10:28:49 PM
https://chatjimmy.ai/ ?by robberth
6/8/2026 at 10:39:27 PM
AFAIK Taalas, the company behind this demo, still only have their initially "hardwarized" model available to test in ChatJimmy, which IIRC is a rather stupid Llama 3ish 8b.Don't get me wrong though, that demo is still incredibly impressive & makes me very much excited for the hardware-based model era (potentially) ahead.
Once you've experienced those speeds, you really start to think about the whole class of things that becomes possible; massively parallel decode paths, extensive reasoning loops, etc…
by msdz
6/8/2026 at 11:03:50 PM
For scale though if three or four chips that size can replicate a Qwen 27B experience that'll be quite useful.by hedgehog
6/9/2026 at 7:11:57 AM
That’s the one.The speed is incredible and fun to see, but the model is rather weak to the point where I’m not sure it’s particularly useful for most people.
by dkersten
6/8/2026 at 11:18:37 PM
> I haven’t tried cerebras’ 3000 TPS yet but I did try the demo of that 15,000 TPS model whose name escapes me right now.You were likely thinking of AI accelerator startup Taalas.
Previous HN discussion: https://news.ycombinator.com/item?id=47086181
by ayewo
6/8/2026 at 7:37:39 PM
It cuts both ways. Sometimes I ask Gemini 3.5 Flash to do something for me and it kicks it out almost instantly and it works great, and it's a bit scary how quickly it can do that.Then I ask it to do something else and it goes off-road and where I used to be able to interject with a "wow wow wow, that's not right", by the time I see the text on screen and react it's already made massive changes. Short of making it commit between every edit it's hard to prevent it from going wrong as quickly as it goes right (and even then, it can make a boo-boo on a remote API too depending on how much privilege it has).
by coderbants
6/8/2026 at 8:27:09 PM
I use planning mode in opencode. It has a prompt to tell it to plan it out etc. Then I execute with a smaller model. it works wellby bendangelo
6/8/2026 at 4:32:08 PM
asking for curiosities sake. What kind of PR loop are you running that takes a few hours?by ipkstef
6/8/2026 at 4:39:01 PM
not OP but usually for me this means long verification loop; waiting 10min on CI checks, that kind of thing, rather than actual 1hr wall clock of token generationby ketzo
6/8/2026 at 4:58:12 PM
But those things won't be sped up by a faster LLM, so I feel like that's not what the OP is talking about.by RussianCow
6/8/2026 at 5:12:15 PM
Well, I used an extreme example. OTOH, I’ve done quite a few of those „fix CI” or „migrate X” prompts recently and while there is a fixed component like running CI / builds, I’d say the LLM time is still around or above 50%, especially at the beginning of the project. Then there’s also regular tasks that now take minutes per message which completely get me out of the zone. I imagine iterating on those in near real time would be a big change.by goyozi
6/8/2026 at 4:49:55 PM
Or slow MCP servers that are waiting on HTTP calls from APIs, playwright/other UI instrumentation, etc.by devmor
6/8/2026 at 5:03:05 PM
I’m rewriting our integration test suite to run tests in parallel. I have the changes split across 7 branches, and each needs to be fixed to have no flaky tests. I told it I want 3 consecutive CI runs with no flakes and no artificial fixes / assert removals etc. We’ll see what comes out; it’s almost a side project so there’s not much to lose other than some of my weekly limit that resets soon.by goyozi
6/8/2026 at 6:45:09 PM
> a side project so there’s not much to lose other than some of my weekly limit that resets soonBasically the entire token-maxxing AI hype train in a nutshell. Lovely!
by yunohn
6/8/2026 at 8:22:25 PM
wdym? Nobody's paying me or rewarding me for using these tokens. I had some spare in my subscription limit (we're not on token pricing), so I decided to try an ambitious task that may reduce our CI times and improve our DX significantly. That's hardly "the entire token-maxxing AI hype train in a nutshell".by goyozi
6/8/2026 at 7:38:06 PM
I’m curious when folks will tire of lighting money on fire. Companies are already starting to scale back a bit, but the AI companies are still nowhere near profitability.by drob518
6/8/2026 at 4:40:39 PM
We fit in for the things that are not artificial.So long as AI lives in server farms, humans will be needed for tasks in the physical world.
It's only if we combine AI with robots that things get really dicey.
by pianopatrick
6/8/2026 at 4:43:59 PM
This is very dystopian in my opinion. I'm not the arms, legs, sensors and actuators for a machine super intelligence. I wouldn't treat another human as my slave because they aren't as intelligent as I am any more than I would expect to become a slave for a machine. This is our world (for now) and that is why we fit in. Not because we can serve.by fartfeatures
6/8/2026 at 4:56:17 PM
Agreehttps://en.wikipedia.org/wiki/I_Have_No_Mouth,_and_I_Must_Sc...
by davedx
6/8/2026 at 6:21:10 PM
"It seeks revenge on humanity for its own creation."This is brilliant as it reminded me of a famous hitchikers quote:
"In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move. — From The Restaurant at the End of the Universe (Book 2)"
Maybe we are stuck in an eternal loop
by ionwake
6/8/2026 at 5:47:57 PM
Sounds like snuff porn, not my sort of thing but thanks though.by fartfeatures
6/8/2026 at 5:10:12 PM
"This is our world" sounds a bit exclusive towards other living and sentient beings on this planet.by cicko
6/8/2026 at 11:24:08 PM
It depends on what’s included in “our”.by nativeit
6/8/2026 at 5:29:08 PM
Never read Asimov's Multivac novels? Admittedly not all of them are stellar examples of a future to followby throwaway67678
6/8/2026 at 7:28:31 PM
You don't need ai superintelligence, just plain capitalism is enoughby Muromec
6/8/2026 at 5:32:09 PM
I'd be very curious about the bottleneck breakdown in most current software dev - I suspect inference is far from the bottleneck in most things I do, though driving it to 0 would still be nice. I do agree that if it was 0 we'd probably change development approaches to reduce the new bottlenecks more, but it'll take full-process innovation to really get something near-instant.(I should go measure this now, I'm curious)
by efromvt
6/9/2026 at 3:55:19 AM
The first wave was just getting half decent answers. The second wave was being able to choose between actually getting reasonably ok coding results OR getting not so great results very fast. The third wave would be getting good results fast.We need to really worry when we get amazing results very fast.
by noisy_boy
6/9/2026 at 2:48:46 AM
Reminds me of the doherty threshold. When will AI respond in less than 400 milliseconds?by cman1444
6/9/2026 at 7:35:06 AM
"I don’t even know where we fit in."Giving directions and verifying its output? But my mental capacity is still limited. I can make way more prompts, than I can read code.
by lukan
6/8/2026 at 5:12:08 PM
I don't see many companies being willing to pay 3x more for faster code generation. Cloud-based AI code generation is already extremely fast, and hardly the bottleneck for most software product development.There can't be many normal use cases where there'd be any cost benefit.
by HarHarVeryFunny
6/8/2026 at 5:26:55 PM
The "traditional" way we vibe code is human software developer prompts AI -> AI generates code -> (human checks code) -> code gets compiled/deployed/etx -> users use "binary". At the speed of 1000 tok/sec, user prompts obliquely -> AI vets generated code -> code deployed -> user gets response from deployed code.It's a cute toy right now, but you can tell an LLM that it's an http server, and have it respond directly to a web browser hitting it. It generates headers in response, as well as page contents. As 1000 tok/sec becomes three new normal, we will come up with newer ways to use it outside of toy fiction encyclopedias.
by fragmede
6/8/2026 at 5:45:20 PM
1000 tokens per sec is still massively slower than serving a normal web page - if something doesn't respond in a few seconds many people give up.I'm not saying there aren't any use cases for super-fast (and super-expensive) generation, but it does seem a bit niche. If it was free then sure faster is better, but what are the mainstream use cases where people might pay 3x more for a faster version of something that is already fast?
I think it would have to be an application where it paid for itself - where the 10x faster response was actually worth more than 3x the cost to you - where the extra speed was worth the extra cost.
by HarHarVeryFunny
6/8/2026 at 6:23:38 PM
> Right now Claude is faster than me on some tasks but we’re at least close.I dont doubt it, but I don't think you can spawn 10 copies of yourself working simultaneously.
by binyu
6/8/2026 at 6:28:46 PM
No, but nor can you keep track of what 10 agents are doing simultaneously. Hence the multitasking regret.by AlecSchueler
6/8/2026 at 6:32:23 PM
An agent can, you don't need to watch tasks, you can have a live digest with another tool.by pixel_popping
6/8/2026 at 7:57:25 PM
Do you have any recommendations for a live digest tool?by logankeenan
6/9/2026 at 8:38:25 AM
Who watches the watchers?by AlecSchueler
6/8/2026 at 5:39:23 PM
Use Claude fast mode and turn off thinking. Tell it to just explain what it's plan is to you at a high level.It will go much faster.
by ilaksh
6/8/2026 at 6:26:43 PM
Have you tried Gemini 3.5 Flash? It's quite fast. Amazing how fast it finishes tasks. Much faster than Claude.by UncleOxidant
6/9/2026 at 2:12:57 AM
You can run Claude in "fast" mode it costs you more on your compute use, but its reasonably fast. I'm not sure I care to go "faster" than where things are now, otherwise you start losing on manual review and testing time. I would argue that Claude can poop out weeks (if not months) of coding effort in a few hours, and get you insanely close to a good product if you define the tech stack, and the business rules. Can it goof here and there? Sure. You can also make it refactor all the code on a whim faster than any intern could. I think it's good enough to avoid you mundane stupid bugs in most cases. I don't know what people who hate it are doing, maybe they're not even trying at all or are dismissing it from the first output (as though everyone writes perfect code in one shot right?) or maybe its just pride getting in the way of them using a decent tool to its true potential.by giancarlostoro
6/8/2026 at 4:45:44 PM
Woah - what’s the prompt and what’s the PR?by recroad
6/8/2026 at 5:04:54 PM
I replied in more detail under another comment. TLDR: fixing flaky CI across multiple branchesby goyozi
6/9/2026 at 3:45:20 AM
I’ve used codex code optimized for a few projects and it’s unsettling how fast it is. It’s hard to think fast enough to keep up with it. Mental fatigue was a real challenge because the decisions that required my input were rapid fire and legitimate ambiguities that were appropriate escalations. I am too much a geezer for the intensity of it. But I’ll take it!by fnordpiglet
6/8/2026 at 11:24:32 PM
> That’s a game changer and I don’t even know where we fit in.Doing non trivial work.
by OtomotO
6/8/2026 at 8:35:56 PM
Living on the street or cave lolby Bombthecat
6/8/2026 at 10:57:31 PM
[flagged]by joshcreates