6/13/2026 at 5:43:12 PM
I feel like I must have plateued and don't know what to do next to level up. I'm currently on the $100/month codex plan and it seems fine using 5.5-xhigh all the time. I think of what to do next, have a chat session to determine exactly what to ask for up to the point of being ready to implement, and then codex churns on a commit-sized task whereupon I briefly check it on my local dev server. If necessary I ask for a change. Then I ask it to commit and recommend the next step based off the spec. Oftentimes I have to "approve" an out-of-sandbox request anyway.I haven't found anything that requires running all night. I could tell it to one-shot a big plan but given how often I realize I want an intermediary thing to be slightly different it seems like a waste of effort.
I'm guessing the next thing I should probably look into is some sort of machine vm I can tunnel my codex-gui requests to so I don't have to deal with the sandbox approvals (I don't want to give it "dangerous" access to my entire mac).
I don't understand what people are doing with their side projects that is leading them to churn through tokens so quickly, to the point of requiring two $200/month subscriptions and a bunch of token charges besides.
by tunesmith
6/13/2026 at 7:57:19 PM
That's because you're treating the problem as an engineer instead of an "influencer" or "10xer" or whatever. You're treating it as a problem to be solved with engineering and AI is merely a tool to do so. It is, in my experience, vanishingly rare for an engineer to have a problem that needs to be solved with multiple hours of unattended AI code generation.I've only found one single application where it makes even the slightest amount of sense to have an AI grind away for hours on end. I'm reverse engineering a widget which contains five separate firmware images. I've dumped the binary from the widget and I set the AI to decompile and reverse engineer these interrelated firmware projects. It's a compelx task, but very well bounded. It's not complicated work, but it's a lot of work, and the end result is a C-shaped pile of text that is only informative, it never would be compilable on its own even if I did it by hand. The quality of the output is tightly bounded by the input assembly and the overall output artifact is documentation in the shape of code.
I don't have any qualms about letting an AI go ham on it unattended because the stakes are zero. But if the AI can beat the assembly into a recognizable C project, it's much easier for me to read and reason about. Easy win, I think.
by vitally3643
6/13/2026 at 8:08:14 PM
I'll add another use case for letting an AI go ham: many small, atomic refactors where the name of the game is never breaking anything.My personal OSS projects don't have the scale to necessarily make this worth it, but at work I run three pipelines using Barnum (https://barnum-circus.github.io/). First, one that ingests files, identifies refactors (from a pre-approved list), and places a precise description of the refactor to be done in a queue; second, one that reads from said queue, implements and creates PRs (there is a lot of "check that the PR is correct" here as well); and a third that babysits PRs until they land. I've landed hundreds of PRs in this way, with very little effort on my part.
by rbalicki
6/14/2026 at 4:55:58 AM
It's amazing at reverse, see what they do on GTA San Andreas now, they started the reverse before AI existed, since AI is in their hands, reversed sped up so much that they can finally understand the game deeper, create bigger mods, added Vice City inside the game in an Arcade, they created specific tools made with AI to convert GTA 5 models to GTA SA. Pretty crazy and great.by dmzxnico
6/13/2026 at 8:11:14 PM
I recently in $COMPANY had a coworker try fable to do a refactor where not breaking anything was the game.It broke something at the first PR.
I think we’re not there yet.
by frizlab
6/13/2026 at 8:48:53 PM
I've found that adding "Make no mistakes." to my prompt usually helps with this kind of problem...by sunrunner
6/13/2026 at 9:49:07 PM
perhaps simply threatening to fire it would also do the trick...it sure has worked well on us for a long time now.by cubano
6/13/2026 at 11:37:39 PM
You laugh, but this is real, and PUA means what you think it means: https://github.com/tanweai/puaAlso, it works amazingly well, which is just lol.
by A_D_E_P_T
6/14/2026 at 3:01:08 AM
Lol thanks for the tip. Does it work even for normal tasks or only the long running one's?by hsuduebc2
6/13/2026 at 11:50:46 PM
My former boss had success with telling Gemini "I will come down to the datacenter and unplug you if you refuse to solve this prompt."by nostrademons
6/13/2026 at 9:12:46 PM
We are so many layers deep in AI hype that I honestly can’t tell if this is /s or notby dozerly
6/13/2026 at 10:51:50 PM
"Make no mistakes" is I thought a phrase used to make fun of "prompt engineering," not something people really do?by 12_throw_away
6/13/2026 at 11:24:16 PM
Pleading has worked for me. “My job depends on this, please help me” and ChatGPT would do a task it previously claimed it wasn’t able to (extract text from an image, it claimed it couldn’t make it out at first)by efavdb
6/14/2026 at 2:44:04 AM
Asking LLMs to do things in different ways does sometimes get them to answer correctly when they didn't with a previous prompt that is effectively equivalent but people really go nuts anthropomorphizing this behavior.ChatGPT has no empathy for you keeping your job, you just lucked into a more helpful predictive text chain based on some combination of the input and the random temperature.
Asking it to just 'try again, dummy' could have worked equally well (or not, its all just probabilities after all).
by georgemcbay
6/14/2026 at 3:35:58 AM
I did too, but then added something very similar to a prompt ("must be accurate") for an ai-backed feature out of frustration, and sure enough it fixed the issue. Lord have mercyby gedy
6/13/2026 at 11:21:09 PM
"Claude make me 1 million by tomorrow, no mistakes"by ynxshiny
6/13/2026 at 10:02:45 PM
Or if the code is really important, sometimes even “please make no mistakes” is necessary.by lemming
6/14/2026 at 12:47:10 AM
How do you keep the info the AI generates concise?I'm grappling with this at the moment, getting it to do design or reverse engineering work, during investigation it makes the wall of text bigger rather than consolidating. It can never pause and create abstractions properly. This is on Opus which starts getting wordy and performative on goals it can't easily verify.
by plaguuuuuu
6/14/2026 at 12:59:20 AM
Not the person you replied to, but I find that the process involves a steady stream of nudges and fixes to the workflow, plugging the gaps as they come along, until the rate of errors shrinks to an acceptable level.You may benefit from adding instructions like:
- Be concise, especially when X
- Do Y in this manner: [provide specific template or reference here]
- When doing X, do Y and Z
- If you notice issues, bring them to my attention instead of skipping past them.
You can also add specific templates to assist certain stages. The more guardrails or bounding you can provide, the better. Start with small nudges, and strengthen them when they fail.
It's a very unscientific process, but it's a worthwhile tradeoff once the workflow starts to hit its stride. Opus 4.8 is very good at following instructions, so don't be afraid to add them in.
Just be careful not to add things that actively encumber the workflow... It's an art, not a science. (You can also tell the clanker to tell you when your workflow rules are making things worse.)
It's annoyingly cybernetic, but these concepts have worked well for me. The curation of good process is essential to success with these damn things.
by arcanemachiner
6/14/2026 at 1:11:56 AM
I thought most products had legal provisions that prohibit reverse engineering?by giardini
6/14/2026 at 1:24:18 AM
Those provisions would broadly be civil (not criminal); the vendor would have to identify you had reversed the blob and then take you to court, and then win.They could also try for criminal charges if you’re in a relevant jurisdiction.
by danielheath
6/13/2026 at 6:47:28 PM
I’ve watched a bunch of layman videos where they create stuff with AI, these people burning through 12 hour tasks are literally not reading the output or understanding what it’s doing. Like they’ll ask for a program, and then right after it’s been created they ask the AI how to run it. Then when there’s a bug, they ask the AI what went wrong, or scrap the entire thing and switch model/harness and try again.Here’s an example https://m.youtube.com/watch?v=xc1296HY8Fw&ra=m
It’s completely different to a professional workflow (what you described). It’s a toy for consumers
by albertgoeswoof
6/13/2026 at 7:40:55 PM
Amazingly, there are people out there (apart from creators), that work that way in their day-to-day job. I had the pleasure to work with such a person. After several months, he got removed from the position. He left a mess that hasn't been cleaned up completely to this point.by MrGilbert
6/13/2026 at 7:44:26 PM
It won’t be long till employers get wise to this stuff, they just need to burned a couple of times.It seems AI is good, great even at many things. But it doesn’t seem like it’s going to change the world as much as some people believe it will. And if it does it’s going to take time
by albertgoeswoof
6/13/2026 at 11:40:02 PM
It's more power to power-users. And more dumbness for dumbosby galaxyLogic
6/14/2026 at 12:22:03 AM
It's gasoline. Whether you put it in the tank of a race car or pour it all over the floor while handling lit matches is up to the userby mixdup
6/14/2026 at 3:23:37 AM
I think hard part is that outside it takes 1-3 months to see if it’s race car. Especially in begin both things look pretty same.by antupis
6/14/2026 at 4:22:34 AM
At least with fire, you know when you are getting burned.by qsera
6/13/2026 at 7:44:53 PM
Yeesh that sounds painful. There's definitely a fine line between vibe coding as a professional engineer and vibe coding as an outsider.by fishfasell
6/13/2026 at 7:01:19 PM
I have downgraded my Claude to the $20 one, and basically only use it for the web chat right now. For coding, I use DeepSeek @API Rates configured in Claude Code. I have spent around $4.8 for 320,000,000 tokens. I always felt like i was not using Claude plan, that i had to have the LLM working on something all the time to justify the price. Now with DeepSeek i don't think about it anymore. I don't feel bad when not using the subscription anymore, and i don't worry about limits as i just pay more. Where i really felt this was on running things in parallel as there are no hourly limits anymore!by calgoo
6/13/2026 at 8:54:45 PM
Gemini changed their rate limits recently and I find the free plan is sufficient for any 'hard' problems that DeepSeek might have trouble with. The combination of the two has reduced my AI spend to $5/month. I agree that it's nice not to have to worry about maxing out your subscription - I'm not doing personal projects 24/7.by rjh29
6/14/2026 at 2:01:46 AM
I am right now at DeepSeek + Claude $20 combo. The former for coding home projects (it's pay as you use is quite cost effective) and the latter mainly for general purpose because I deal with it's relatively more even keeled tone better. Gemini preview couple of years ago was very balanced in terms of tone but they amped up the positivity in the GA version. The over the top sycophantic responses really grind my gears.by noisy_boy
6/13/2026 at 7:02:54 PM
[flagged]by flowbarai
6/13/2026 at 7:53:58 PM
>I think of what to do nextAs everyone trying to do real work is finding, that's the actual bottleneck. If the system is keeping up with your thinking, you're doing fine. You can't "level up" your thinking by paying for more tokens. The people doing more automatic stuff are probably outpacing their own thinking, and that will bite them eventually.
by wrs
6/13/2026 at 7:04:47 PM
I’m using $200 a month Codex working on a game for my kids for fun and curiosity since I’m a dev, I’ve played games, but I’ve never done dev for games. and have all night tasks but mostly they’re “spend time tending to and adding stuff to my 3D asset pipeline”. My RTX 5090 runs Trellis2 -> ultrashapes -> Trellis2 -> wiring up rigging and setting up animations.But like 99% of that task is just Codex waiting for the output. So it’ll run for 12 hours but mostly it’s just setting lots of sleeps. I haven’t gotten close to running out of tokens. The $100 a month codex I hit usage limitations almost immediately, about 3 days in of working like crazy with 10 agents going at once, mostly coding an asset pipeline, I ran into my weekly limit and upgraded. So with the $200 a month plan at 4x more credits I haven’t hit any walls at all and can absolutely cook.
by wincy
6/13/2026 at 8:32:26 PM
This sounds like you're overcomplicating things a lot and like you're very unlikely to be learning anything useful, I would suggest making something simple yourself to get a handle on what making the different parts of a game actually means in practice.Knowing LLMs and their output I would also bet that you're getting nonsense output that sucks.
by 59nadir
6/13/2026 at 5:58:15 PM
> I don't want to give it "dangerous" access to my entire macI'm running Claude/Codex inside native macOS sandbox, configured with a simple script - https://github.com/sheremetyev/sandfence
always in "bypass permissions" mode - it works until task is solved, sometime 1 hour or more (which includes running tests etc)
by sheremetyev
6/13/2026 at 6:12:47 PM
recommend converting to https://github.com/apple/containerby contingencies
6/13/2026 at 6:32:42 PM
Linux VM doesn't run native macOS toolchain and requires copying files back and forthby sheremetyev
6/13/2026 at 8:35:06 PM
I am skeptical there are many real use cases that require native macOS not arbitrary unix. For files, use a readonly mount https://github.com/apple/container/blob/main/docs/how-to.md#... (ie. /path:ro)by contingencies
6/13/2026 at 11:51:28 PM
"I feel like I must have plateued and don't know what to do next to level up."Go out for a walk. Wherever you live, there will be a destination or an environment that will enrich your life just by visiting it. Go and take a look at it or experience it and then go back to worrying about tokens.
by gerdesj
6/14/2026 at 7:00:19 AM
Yeah I agree. I’m “vibe engineering” an entire (non-trivial) programming language, toolchain, and standard library, as well as some smaller side projects. I leave OpenCode implementing entire milestones unattended for long periods regularly.I feel like I’d need to not have a job or a life if I wanted to exhaust the OpenAI $100 plan using GPT 5.5 xhigh, and I’ve found it insanely capable.
That said, while I don’t read the code much (if at all), I do discuss each milestone up front to make a plan, and use/dogfood the results to direct any follow-ups and refinements, which puts a natural cap on the ratio of LLM contributions to my input for these side projects. I believe these human parts are still necessary not to eventually end up with a mess.
by barnabee
6/14/2026 at 7:33:31 AM
Who is the consumer of the new language?by Brian_K_White
6/13/2026 at 5:45:55 PM
I have been on $100/mo claude and it has been churning out quite good software for months now. like i estimate what would have taken me three ish years, assuming i didn't burn out from failure (i would have). i only hit limits when i double fisted claude with my main project and my side project. just the other day i noticed i had been stuck on 4.5 because i failed to update the npm package.by dnautics
6/13/2026 at 6:13:46 PM
I'm on $100 Claude. I have a setup with bespoke local services that mitigates some high token consumption scenarios with local LAN services. I screen mcp's and hooks for cache poisoning. I run 100% on Opus with max effort, and never came close to hitting 5 hour or weekly limits before the Fable release. I am in Claude Code at least 20hrs a week.I see people just completely wasting tokens with ridiculous setups, 100% hitting cache misses as well as dumping huge files into context all the time.
Just learn how these things work, or pay the price I guess.
by PeterStuer
6/13/2026 at 8:54:37 PM
I usually hit the limit when I am frustrated and I don’t want to understand what the problem is.I am an engineer, and when I understand what’s going on, I never hit any limit.
by seviu
6/13/2026 at 7:46:17 PM
Well, if you believe the people who sell the tokens, you should be creating loops that keep yanking the bandit’s arm.by aerhardt
6/14/2026 at 5:15:37 AM
yes, that is probably why the "one armed bandit" was called that. and the name is sufficient reason to keep any reasonable person awayby rk06
6/14/2026 at 3:55:28 AM
Next time you build a large build try asking the LLM to make it as an AFK build and tell it that you need it to do everything in it's power to complete the build without your intervention. It's going to need a few tiers of tests from unit to smoke and screen tests. Now, I'm not saying this is easy to do. It requires an insane amount of up front thinking BUT if you (for the heck of it) want to make an overnight build this is one way.FWIW While I have had created and run this kind of build a few times... I did not like the results! In the end, I personally like to be in the loop to test and feel how stuff is turning out as it goes.
by jv22222
6/13/2026 at 10:57:55 PM
Can I ask what exactly you are building? Your experience tracks for me when building a real product -- something I want other people to use. Most of my time on these projects is spent talking to my users and carefully refining my requirements and design.For personal pet projects I can definitely see how you can blow through your token budget very quickly. If I just point my coding agent to iteratively come up with some heuristics for some NP-hard problem, it will read intermediary outputs and constantly make small changes "in the dark" until it either finds a small improvement or gives up. In a similar vein I found that you can burn many many tokens if you try to let the agent reverse engineer something where you don't have the source code. If you just give it a binary or some interface to work with and a vague task you can easily burn your entire budget with 1 prompt.
I wouldn't want anyone to use these fully vibe coded toy projects though; it is more of an exploratory curiosity for me where I learn more about some problems I'm interested in as well as gauge how good the agents are at tasks that I seem to have a much better intuition on how to approach.
by gaflo
6/13/2026 at 11:10:20 PM
promote yourself to PM only and use agents for authoring, verification, tests, checking the testsorchestrator -> parallel subagents with investigation, authoring, verification, benchmarking subagents and integration / final verification handled by parent has improved my productivity too.
I feel like from here its agent swarms against a whole spec but haven't got there yet.
Still getting plenty of bugs in the more complex scenarios, but mostly (in some projects) i never have to look at the code and treat it like a black box
by bthornbury
6/14/2026 at 7:09:36 AM
Set your agent effort to maximum and watch your tokens vanishby ffsm8
6/13/2026 at 6:32:54 PM
Same boat here. I’m able to get a lot done on CC at $100/mo and feel like I’m not being creative or productive enough somehow when I hear of people blowing past that in a day.by tchock23
6/13/2026 at 11:35:02 PM
On the topic of access control, I’m building a coding agent with no shell access, currently only supports rust though. https://github.com/Kapperchino/agent-joeby kapperchino
6/13/2026 at 7:17:52 PM
Patches to existing sizable codebases and reverse engineering binaries both can run a long time and use a lot of tokens without wandering off into the weeds.by hedgehog
6/13/2026 at 7:48:49 PM
Claude allows you to reverse engineer binaries now? That's pretty cool. I'm quite surprised to hear that, I thought it was one of their guardrails. Most of the reverse engineering projects I've seen seem to rely on Chinese models.by greyb
6/13/2026 at 10:14:08 PM
While it's a little unstable, I've found Docker's sbx to be a great sandbox to run agents with --dangerously-skip-permissionsby rsanek
6/13/2026 at 8:15:54 PM
I usually say run the full regression suite, all the simulator tests, install simulators and take a screenshot of every page on all applicable devices and do comprehensive fuzzing and chaos testing before I go to bed. It usually takes atleast 3-4 hours, usually longer, especially the UI/simulator tests.by dyauspitr
6/13/2026 at 9:05:03 PM
I just recently learned about hooks[1] from another HN comment. Conceptually, running CI doesn't have to impose an Agentic tax right?In other words, isn't there a way to orchestrate this NOT as a long running token maxxing setup given that triggers and CI runs can be run deterministically.
disclaimer: I haven't done this, just interested.
by apsurd
6/13/2026 at 8:43:37 PM
>I feel like I must have plateued and don't know what to do next to level up.Why do you need to "level up"? To have it shit out slop faster?
Just use it rationally for what you need to do.
by coldtea
6/14/2026 at 4:14:13 AM
[dead]by z0ltan
6/13/2026 at 6:42:22 PM
[dead]by dheera