Ask HN: Is anyone experimenting with different ways of using LLMs for coding?

7/3/2026 at 8:48:29 PM

I'm using what I call "hermetic agents", where completely sandboxed agents write code and tests from the same specification, where the code writer can't see the test and the test writer can't see the code. The idea is that we can get better quality this way (by avoiding confirmation bias between code and tests). It is more painful to set up however, since you have to distill a spec and guides that the agent would normally hook into using RAG.

It is more like people (agent?) management than coding though. I'm setting up and debugging processes, rather than writing code. I spend a lot of time cursing at and arguing with the agents I'm using to set up hermetic agents (who I can't argue with obviously, but I can have conventional agents go over their logs to figure out how to improve their sandboxed-context).

by seanmcdirmid

7/3/2026 at 11:51:57 PM

This is similar to how our college CS problem sets were graded. We were given a spec, and we had to implement a program that conformed to it. We had access to 70% of the test suite during development, and another 30% was hidden and only evaluated after submission. We were graded out of 100.

It was effective at making you think about the problem and anticipate what tests might be missing. I can see how this would be effective for coding agents, which tend to get progressively lazier at writing tests as session context grows.

by chatmasta

7/4/2026 at 5:40:31 AM

The interesting twist here is that the test writer can also be wrong, the QA agent assigns blame when the implementation disagrees with a test case, and it’s not always going to be the implementation’s fault.

Makes me wonder if there should be problems for CS students where given a spec and an interface, but not an actual implementation, they write the tests instead.

by seanmcdirmid

7/4/2026 at 5:31:23 PM

Had this issue at uni back in '08 (physics), my code did what was asked of it, but the auto grader didn't like it. Nagged the professor until he had a closer look and fixed the issue, and asked me to resubmit.

by exe34

7/3/2026 at 8:54:57 PM

I did a thing sorta like this between app and infra. My app agent sends messages to the infra agent for what it needs and they go back and forth to sort things out. They invented their own working process on top of it and wrote their own tool. It's interesting watching all this work. Feels like the future more than a lot of things I've seen in the past few years. Shame we're doing this to the detriment of the environment and the economy.

  """
  Claude Code mesh: gossip-based multi-instance coordination.
  
  Usage:
    mesh.py register --id ID --repo PATH --keyword KEYWORD
    mesh.py list
    mesh.py send --from ID --to ID MESSAGE
    mesh.py broadcast --from ID MESSAGE
    mesh.py watch --id ID
    mesh.py forward --id ID MESSAGE_JSON
    mesh.py peers --id ID [--n N]
    mesh.py clean
  """

by leetrout

7/3/2026 at 8:59:19 PM

That's sort of crazy.

My agents are completely locked down and their communication channels are limited. For example, a QA agent can give feedback to the coder or tester agents, but it can't reveal information to them that they shouldn't know about.

It also prevents them from going off the rails, since they can't really do anything outside of their specified tasks (tools are locked down as well).

by seanmcdirmid

7/3/2026 at 9:34:34 PM

Have a look at GitHub.com/Strapchay/protag. I have a similar concept implemented. It’s still a work in progress and I haven’t worked on it for a while but the context is that they can only make modification through a cli tool created for the project which also restricts the context of the files they can look through and each domain agent is assigned specific files and can’t modify beyond it. Still a lot of work to be done though

by strapchay

7/3/2026 at 11:41:21 PM

Literally converged to this same pattern over this month.

by pkoird

7/4/2026 at 12:15:58 AM

do you mind sharing the more specific setup or agent framework? Hermes? LangChain? DIY?

by ernsheong

7/4/2026 at 5:44:09 AM

I work at Google, so it’s antigravity agents using Blaze (like Bazel) to configure their context, including DAG-style dependencies on other hermetic agent outputs, what specific tool calls they can make, and their communication channels (they can send feedback to agents backwards on the DAG). Otherwise this is completely DIY (antigravity does a lot of the heavy lifting, so nothing fancy accept for custom sandboxing).

by seanmcdirmid

7/3/2026 at 9:24:29 PM

Doing similar stuff. I curse a lot. The processes do transfer mostly between different models which a bit validates the approach.

by kosolam

7/4/2026 at 7:37:19 AM

[flagged]

by verify-ai

7/3/2026 at 9:22:15 PM

I, like many nerds of the same stripe, have a dragon's hoard of every PC component I've owned in the last 20 years. I've attached as much of it to my homelab as is practical, but there's still a pile of GPUs from the last decade plus.

So I decided to load up everything with more then 3GB of VRAM into various machines on the network. Anything that could conceivably run an LLM of any utility. I've been experimenting with driving a swarm of heterogenous LLMs into coding tasks. I have models as small as llama3.2:3b up to Qwen3.6:27b dense. Over 10 unique models in the swarm.

So far, the results are... interesting. Coding isn't great, but what has worked shockingly well is polling the swarm for opinions. Getting ten unique perspectives synthesized into a single summary has been astonishingly useful. When I gave the swarm the ability to debate with itself, the results got even more interesting.

The end goal here is an autonomous routing network that learns which models excel at which tasks, which machines can fit which models, and intelligently routes requests and models to where they're most effective.

I can't afford an RTX 6000, but I can run smaller models on the pile of GPUs I do have. So far it hasn't worked out the way I'd hoped, but it did turn out to be very useful in other ways. Hopefully soon I can get coding worked out and the swarm can drive itself into self-improvement

by vitally3643

7/4/2026 at 10:55:09 AM

> every PC component I've owned in the last 20 years. I've attached as much of it to my homelab as is practical

this sounds like a disease. one of the early signs for me was developing a particular interest in the breaker box.

by hashmap

7/3/2026 at 10:58:41 PM

Even though they arent Nvidia GPUs I wonder if setting up a swarm of Intel GPUs would be worthwhile for a similar setup, more VRAM, more diverse models.

by giancarlostoro

7/4/2026 at 3:58:03 AM

Using OpenVINO and a supported model can get you surprisingly far...

by 7speter

7/4/2026 at 3:03:16 PM

Nice, not sure why I was downvoted for suggesting the cheapest GPU which comes with quite a lot of VRAM... but I guess some people are fatigued by AI / GPU talk?

by giancarlostoro

7/3/2026 at 9:38:38 PM

This sounds cool, do you have a more detailed write-up?

by fellerts

7/3/2026 at 9:46:55 PM

Nope, honestly it's about three weekends worth of vibe coded python slop. Far, far below my personal standards for publication. Maybe one day when I get a useful result I'll publish a blog.

by vitally3643

7/3/2026 at 9:55:35 PM

One of the cooler things I've read on here as it's a next level "homebrew" AI setup! I'd be curious what some examples of debate outputs are. Please consider writing about it!

by fblp

7/3/2026 at 10:38:10 PM

Does this unit have a soul?

by ninjis

7/4/2026 at 2:12:05 AM

I've been experimenting with making AI generate literate code. The goal is to have the AI produce a bunch of prose alongside the code. A lot of context for the purpose and design of what you are doing usually gets thrown away, but with literate programming you can save it. And this way I can have some hope of understanding the code being created. I've liked it so far.

I am using a literate programming project I built (https://github.com/adam-ard/organic-markdown) and I have an AGENTS.md file that looks like this:

"All the work we do on this project should utilize the organic-markdown literate style that you see explained/demonstrated in the `organic-markdown` project: https://github.com/adam-ard/organic-markdown

Some guidelines for how you should utilize organic-markdown literate programming..."

Then I list a bunch of conventions for making literate code the way I like it.

by aard

7/3/2026 at 7:31:21 PM

"I haven't been able to enter flow state like I can when I hand write code." new flow state is having 10 terminal tabs in diff worktrees and trying to remember what each bit is

by tombot

7/3/2026 at 11:56:27 PM

It hits the exact same endorphin system as “one more turn” style games like Civ. You can manage a few cities and keep them healthy, but the rest of your empire eventually regresses into a set of chores. Any time you encounter one of your zero growth cities you just queue some thoughtless production automation to keep it out of your “next turn” cycle as long as possible.

by chatmasta

7/4/2026 at 2:08:02 AM

Yeah, I see this as a design flaw in Civ. Better to limit the player's moves per turn to keep things moving. There's a game called "Ozymandius" that I quite like, or for a simple online game try Compact Conflict:

https://wasyl.eu/games/compact-conflict/play.html

Hopefully the LLM's will get fast enough so we won't need to multitask, or least we can juggle fewer tasks.

by skybrian

7/3/2026 at 7:48:34 PM

It sounds silly but lately I've been able to hit flow states doing exactly this.

by codybontecou

7/3/2026 at 10:09:24 PM

While it can be productive, I never feel like I'm in a flow state doing this. The general context switching can actually be a bit draining to me.

by chris_fullcycle

7/3/2026 at 10:24:21 PM

Quite draining, I hope this is not what the future holds.

by gabriel-uribe

7/4/2026 at 1:49:40 AM

It’s a good reflection of my energy leve.

Sometimes I’m working within 5 projects all with 1-3 agents running. Other times I’m maxed out on one agent running in a single project.

by codybontecou

7/4/2026 at 12:26:35 PM

Here's hoping someone starts to pull these things together into an app or Web frontend so the rest of us can configure these setups more easily.

(I was wondering yesterday, for example, if LMStudio might roll in speech-to-text capability without everyone instead having to jury-rig some custom install/config.)

by JKCalhoun

7/4/2026 at 3:56:34 PM

one option Ive liked for entering flow state for this is running a small 3b param model and using it as an alternative to stackoverflow or websearch while I code.

I don't often do this, mostly bc I don't care too much about most of the code I write and just want it to work, and then I skip over it and see what's missing and tell the LLM to fix it

by sometimelurker

7/3/2026 at 7:38:25 PM

exactly, it's like Bobby Fischer playing 10 games of chess simultaneously

by npollock

7/3/2026 at 8:41:18 PM

so Step 1: Be Bobby Fischer? super helpful for us mortals...

by skeeter2020

7/4/2026 at 12:28:28 AM

Try intimidating the LLM as it writes your code.

That Claude Code regex for curse words? Turns out Anthropic was just looking for the next coding Bobby Fisher ...

by epihelix

7/3/2026 at 10:02:04 PM

Give yourself more credit. Bobby Fischer didn't wake up one morning the chess wiz that he is now, it took years of practice and failing. And we're not talking about chess either. Start prompting one agent on one task in one worktree. Spin up a second agent on a second task in a second work tree. Repeat until you hit flow state for you. It might be at three agents or thirteen. Who cares how many other people are on, get to however many is just under too many for you to manage.

by fragmede

7/3/2026 at 10:54:36 PM

i hate that there are so many worktrees now that its a real pain to find the actual file and modify it by hand.

there are so many stupid little changes (e.g., rewording or deleting their numerous comments) that would be done better and faster by me but it's usually just less cumbersome to keep asking the machine to tweak it instead.

by parpfish

7/3/2026 at 11:10:29 PM

But it feels more exhausting and stressful.

by UltraSane

7/3/2026 at 8:12:30 PM

I had this exact problem, tried opening multiple agents in different terminals but that just frays your flow state even more. There is one great workaround I’ve found.

Walk coding. Walkoding, if you like.

Use a harness, create a harness if you like, then load it up in telegram and off you go. I’ve been on solo hiking trips and shipped numerous features. It means you can stay concentrated on your task, while not sitting there being bored.

It’s truly liberating, highly recommend.

by anthonyfrisby

7/4/2026 at 10:29:50 AM

Are you pausing your hike to review code, check tests, and so on? Or do you just keep prompting blind until you get done with the hike and then batch review all the work?

Sorry if this is a dumb question, I haven't tried any harness-based development yet so perhaps I'm imagining a deprecated workflow in the first place

by pcthrowaway

7/3/2026 at 8:26:36 PM

I've also tried this. Very interesting, but must be used sparingly in my experience. Hiking is a place where before I used to go to think, or not think at all. Prompting an AI while doing so can feel very bad if really what I'm looking for is some disconnection or quiet thinking time.

by tony_cannistra

7/3/2026 at 8:45:54 PM

yeah, makes perfect sense to go for a hike through nature whole checking a phone every 3-8 minutes...

by y0eswddl

7/3/2026 at 10:05:59 PM

Well no, you don't check it, you have notifications set up so it notifies you when it's waiting for you. If you don't want to hikevibecode, don't. No one's forcing you, who are you to yuck someone else's yum?

by fragmede

7/4/2026 at 3:34:54 PM

Isn’t yucking the yum of others the foundation of human society?

by etdznots

7/3/2026 at 7:18:38 AM

Not being able to enter flow state is a very interesting observation. I've felt it too to the extent that I went down a whole new rabbit hole of what it means to be in flow state. Let me know if anybody here wants to know more, happy to post some links.

To answer your question - I discuss the approach with Claude Code (e.g., should I implement my own ACT model in JAX or PyTorch, Python or Rust or Julia, etc.). Then write the initial part of the code myself. Opening up a blank vscode is a simple joy of life I refuse to give up :-) I'll ask Claude for advice if I get stuck, it will helpfully offer to write that code for me, I obstinately decline. Eventually, I'll get bored of some minutiae or other, at which point I'll ask Claude to complete just that part of it.

by avilay

7/3/2026 at 7:57:31 AM

Ok here are the flow related links. This was about 1.5 years ago when I was trying to figure out burnout and it turned out flow (or lack thereof) was closely related.

  * https://youtu.be/VbUFMYs0kXQ?si=xiNw4ZFlla8k-p7w  The person who gives this talk (Rian Doris) has a good newsletter that I still read. I just checked their website and it has gone in full commercial mode, so YMMV.

  * https://www.ted.com/talks/elizabeth_gilbert_your_elusive_creative_genius

  * https://www.amazon.com/dp/0465074871

  * https://www.betterup.com/blog/meaning-of-personal-values

by avilay

7/4/2026 at 12:37:53 PM

https://en.wikipedia.org/wiki/Mihaly_Csikszentmihalyi is considered the originator of the flow concept: "a highly focused mental state conducive to creativity", it's applied to other fields such as music. Definitely worth going down the rabbit hole.

https://en.wikipedia.org/wiki/Flow_(psychology)

https://www.sciencedirect.com/science/article/pii/S002839322...

by telesilla

7/3/2026 at 7:25:37 AM

I'd be interested in the rabbit hole of flow state. Also with regards to the dopamine rewards of solving a bug as motivation.

Sometimes using a LLM can assist these and sometimes it can feel like cheating myself out of a good thing and I'm not entirely sure where the borders are. It could also be related to a sense of ownership or pride in ones work and seeing the value in doing quality work.

by thinkingemote

7/3/2026 at 9:52:24 PM

Debugging can be super fun as we eliminate possibilities and it feels like we are converging to a solution. There have been instances where Claude (Opus family) was not able to effectively debug and I had to step in and do it. But debugging an over-engineered library for example, can become very wearisome. This is when I am really thankful for having Claude Code, it is able to figure out the bug and its fix/workaround pretty fast. I can then get back to doing my main task instead of spending an indefinite amount of time stepping through sloppily written code.

by avilay

7/3/2026 at 7:27:49 AM

I'd love to have some links please :)

by tolg

7/3/2026 at 7:40:43 PM

>I've felt it too to the extent that I went down a whole new rabbit hole of what it means to be in flow state. Let me know if anybody here wants to know more, happy to post some links.

I'm not a programmer, but I very much enter a flow state working on tickets, or playing a video game on higher difficulties when everything "clicks"

by Scroll_Swe

7/3/2026 at 10:03:19 PM

For sure, any task or activity that is hard enough and just outside our reach, can get us into flow state. The trick is in ensuring that it is the right kind of hard, it is not too hard, and we time box the activity/task. If you think of how to beat the boss fight in a video game even when you are not playing it, it is the "right kind of hard". For me, beating the boss fights in Elden Ring were too hard, never got into flow state in that game :-)

by avilay

7/3/2026 at 7:55:18 PM

I miss feeling like I was "in the zone", but I haven't been able to achieve it in years.

Between having kids and a work situation a few years back, it is like my brain expects to be interrupted at any moment, so won't get there.

by doubled112

7/4/2026 at 9:54:07 AM

Ride a bike.

by chickensong

7/3/2026 at 10:08:05 PM

Teaching your kids to have a calendar and focus blocks (once they're old enough) is as good a habit to teach them as it is for you.

by fragmede

7/3/2026 at 10:22:52 PM

Agree! Negotiating focus blocks both at home *and* work can be super helpful. Of course, this is not always possible. Without knowing anything about your situation, it might be useful to rule out burnout as a possible reason for loss of flow.

by avilay

7/3/2026 at 10:34:56 PM

Yeah, flow state is not the same, and I miss it.

I've stumbled into a couple of different ways to work with AI, each with their advantages and disadvantages:

* Ask it to solve the issue and trust the results. You're outsourcing your thinking, and lose understanding of the code. The result might work, or it might not. Chances are it will work, but your code slowly grows messier.

* Ask it to solve the issue and review the results. This should help you understand the resulting code, and give you a better chance of setting the AI straight when it messes up. But you're still outsourcing your thinking, and not thinking of the solution yourself still means you lose touch with the code. But more importantly, reviewing is the boring part of software development.

* You write the code, and let the AI review it. In a way, I think this should be the sweet spot. It doesn't make you faster, but your quality should go up. AI is very good at reviewing, and often finds issues that humans skip over. This is the quality over quantity solution. More than code, I think this is particularly important for writing high-stakes non-code documents, like financial reports for customers. Quality is really important there.

* You tell the AI how to solve the issue. The AI still writes the code, but it receives tighter guidance from you. This is what I usually end up doing. I like to think this results is better code. It certainly gives me the impression I understand better what the code does. And I do tackle much larger amounts of code, but I check what the AI does, and often push back on its suggestions and assumptions. I think this is a nice middle ground between speed and quality.

* Full agent mode. Let the AI do everything. Let multiple agents work simultaneously doing everything. You lose control and your mental model. You're going to have to trust whatever the AI is doing. Something it will be correct, sometimes not. Let's hope you never have to personally touch that code anymore. But it sure is fast.

by mcv

7/3/2026 at 12:15:12 PM

My absolute favorite modality is one I don't use all that much at the moment: Zed's edit completion.

If you're unfamiliar, it's like tab-completion, but it has a context that includes the edits you've made in the last few seconds, and it can predict around the cursor.

The model isn't advanced enough to understand complex tasks, but it has more the feel of the "crafting gun" in Subnautica or other survival crafting games, if that analogy makes any sense.

Personally I hate working with a chatbot - it's low-bandwidth and rage-inducing. If I could imagine a perfect workflow, it would be something like me whispering my train of thought as I program, and then pointing a very fancy "autocomplete gun" at the code.

by colinmarc

7/4/2026 at 7:43:09 AM

That's what VSCode next edit does too. It gathers context from the surroundings and recent actions to suggest blocks of code.

It often feels magical. But I find agentic plan->review->implement->review workflow to be a net positive in cohesion, documentation and throughput, relatively speaking.

by bel8

7/3/2026 at 7:20:04 AM

I'm in the same boat and I'm not a fan on the current way of working of agents, but I think tooling is what needs to catch up.

So, I actually decided to try to tackle it myself and worked some months (full time) on it.

https://beolis.com is the result of that, it's a local cli in a kanban board style with a remote server to keep the team on track (I've been using it myself for some time and actually started to ask some friends to use it just yesterday -- feedback very welcome, I still wanted to do some additional things before asking more people to use it, but oh well, I'm a fan of building in public anyways and it's probably better to have feedback sooner rather than later).

The main point there is that you work mostly in the ticket description (your own spec) and the plan (the spec as the agent sees it, generated with a custom workflow) and then having another custom workflow to implement it (you can choose how you want it -- https://beolis.com/blog/post/custom-coding-workflows has some info on what I'm using myself).

As a result, at least for me, I do spend more time immersed in a flow state (although I'm in that state writing the specs and reviewing code -- although in some cases it's more work to write the spec in a way the agent can work when things get more complicated vs just diving into the code, so, going into "code" mode is something I still have to do, agents are definitely not perfect).

I guess I'm lacking in docs on how to effectively use it. I have plans to create a video next week and post it in the blog, so, if you're interested, keep track of it ;)

by fabioz

7/3/2026 at 8:04:01 PM

I'm working an an agentic graph-based workflow execution engine/framework. The concept of the harness is completely abstracted away/generified - a 'node/agent's is a harness (cc, codex, open code, pi, etc) + model (I test different model and harness combinations). I have a set of tasks from trivial to complex - a set workflows (a workflow is a set of initial nodes and their behaviour) is defined and each one is asked to perform each task (multiplied by each harness/model combination roughly). The workflow can include agents/nodes which are able to modify the workflow graph and create nodes. Other nodes can break down tasks and send subtasks to other nodes. Mostly experimental stage at this point. I'm exploring/tracking metrics such as total wall clock time to complete a task, total cost in tokens and $, among others. This gives me a decent amount of data/insight into the abilities/performance of different harness/agents/models for different tasks, and gives me a great testing/dogfooding of my own harness (which is one of the harnesses being tested, and as of now the most efficient one).

The main bottleneck at this point is the cost of all of the tokens in the fairly large test matrix of tasks, harnesses, models.

I hope to release/open source all of this stuff eventually.

by aleqs

7/3/2026 at 9:11:54 PM

My company tried to build something like this pre-TUI as a tool-AI-IO dag dispatcher. The biggest mistake I made was thinking that people would have no problem figuring out how they could translate their work or define multi-step automations, and focusing on the orchestration and sandboxing thinking that was the core, when it was really figuring out how to get the onboarding UX/complexity to not feel daunting or more trouble than it was worth.

Eventually for my own work, I discovered that the context management and runtime was more like a stream or active service mesh than a dispatching / one-off processing problem, most others' were too. Then all my prompts would degrade across model versions or providers, and I realized that actually setting the context for the tasks and keeping track of it all was a ton of work and something I had to do everytime as an actual user, but never when I was testing or demoing it on existing data.

Curious how you're testing your work and if you've managed to avoid the problems I ran into. I need to permute across the same set of workloads/configs you mention (and maybe more) for my next set of work so I'd be very interested in sharing or collaborating on the test infrastructure! At Google I did a lot of permutation testing using https://github.com/cloudprober/cloudprober and was going to start using it sometime in the next couple weeks. It exists basically one layer above the workload content/targets so it's probably compatible with everything except the test client/driver you're using.

by weitendorf

7/3/2026 at 10:08:16 PM

I'm my case a workflow is basically an active/living graph of nodes/sub-tasks. One node can process a task (with all relevant context) and create multiple fan-out tasks, or it can add additional context/requirements and pass it along to another node. The message/task passing is all implemented as queue - nodes subscribe to messages/tasks addressed to them and execute them, producing more tasks (or zero new tasks). For each task there is a context and a parent task/context, as well as a key/value store of all tasks and their context. Each agent/node gets instructions injected into their prompts that tell them how to look up parents tasks/context as well as how to output new tasks.

There is also a feedback loop - a node can fail to process a task, and pass the reasoning/context for that back to the parent or another node - this might result in a new adjusted task replacing the failed task, or it might require human intervention.

by aleqs

7/3/2026 at 11:21:36 PM

How do you test it across different workloads and are you running it in a datacenter or cloud provider?

I forgot to mention it but the other major problem I underestimated was giving the permission to potentially spend lots of money to AI calling each other in ways I didn't have a good way to monitor, and didn't want to actively watch. So I wanted to set budgets and have them get passed to children, and realized that meant I had to build a pretty complicated billing/scheduling system with a way to keep the part of it with all the permissions and money safe from the AI doing AI stuff on its own, and set up NAT and firewalls and all this other stuff.

If every child can loop back up to its parent, and everything can run stuff from the Internet, and make expensive resource decisions, and get restarted if it fails, then it might not ever converge on being done, or get infected or just mess up and spend a lot of money. I ask about the testing matrix/driver you're using because that's where I realized there was a lot of work and cost involved in getting that part working well enough to run real workloads.

by weitendorf

7/4/2026 at 1:01:00 AM

I have a 'node/container' abstraction at the infra/engine layer which is essentially either a cloud VM or a local podman container. The engine/infra layer can spin up more of these as needed. I have a relatively beefy dedicated machine for working with AI, which is where I do most of the testing.

I aggressively try to keep costs down so the workflow DSL I have supports configurable limits which can be set at the $, token, or time dimension , at task, workflow and agent/node levels, with some same defaults. I have a pipeline which keeps LLM API pricing data up-to-date, and I use AI to estimate total costs before runs and manually approve those.

by aleqs

7/3/2026 at 9:33:24 PM

I rolled my own simple execution DAG program.

It’s shockingly effective due to rooting sub-DAGs into Planner nodes which are the only mutators of the DAG. The deepest topological leaf nodes become the blockers to the next Planner node.

The only other special node is a Human node; structurally impossible for agents to close (I rolled my own harness) and block on my attention.

by fractorial

7/3/2026 at 9:54:45 PM

Nice, yeah also I have planner nodes, review nodes and organizer nodes (organizers can mutate the graph/workflow, create new node types, etc.) Trying to automate the node type/role definitions and overall workflow definitions as much as possible.

I split my project into 3 layers - the engine/infra layer (handles task dispatch/queuing, spinning up node/agent containers, etc.), the 'brain' - basically a collection of different workflow models and related stuff (multiple different models for testing/exploration purposes), and the harness.

by aleqs

7/4/2026 at 1:32:15 AM

I’d be curious what performance / behavior changes you’d observe with two changes:

- planner nodes and topo deepest for rescheduling, no inflight modifications. (repair protocol = redispatch root planner for rare cases where required; ~9% of the time for me)

- no review nodes; strongly enforced on orchestrator to always do adversarial reviews post-codegen and fix out of band

I found that putting myself in the graph is critical to ongoing fidelity, even if subpar to if I had written it all myself.

by fractorial

7/3/2026 at 8:45:15 PM

> The main bottleneck at this point is the cost of all of the token

Are you using Chinese models? Quite a bit cheaper, but maybe still too expensive?

by worik

7/3/2026 at 9:47:19 PM

Yeah mainly deepseek, it performs near top pretty consistently in terms of price/output (with a basic quality measure). I would love to test with more models but that's not cost-realistic for me at the moment.

by aleqs

7/3/2026 at 7:33:58 AM

I'm building "workboxes" to work on my startup. It helps me develop features insanely fast. A workbox is a simple worktree-in-a-sandbox per feature. I have a simple front end where I can launch new workboxes: I input a prompt (a documented grilling session) and it creates a branch, a PR, and starts an opencode coding session on an e2b sandbox based on a custom template with the app's monorepo. Each workbox has a public https endpoint so I can manually test the web app after the coding session is complete. At any point I can either approve the PR, send a follow-up prompt, or connect to the opencode session for more control.

I think my next step is to perform the grilling session inside the front end, currently I perform it in my terminal and then paste in the front end.

by just-tom

7/3/2026 at 8:10:12 AM

Is it similar to how Claude Code Web works? It generates a cloud container and clones your repo, and works on a whatever you want (preferebly something specific), and then it generates a branch and a PR.

by LikelyLiar

7/3/2026 at 8:41:22 AM

I guess. I never worked with Claude Code Web, but it sounds like it serves the same purpose.

The challenge I dealt with is actually running all services on the same machine - 4 services, each needs their own port, 2 of them need docker. Then injecting their sandbox urls into env vars for communication. All to have a fully working app with all services running - I just go to the public web app url and test.

Nonetheless, I'll look into CC Web, thanks for the mention.

by just-tom

7/3/2026 at 7:31:35 AM

YES!

It's still very wip, I spent a couple of weekends on it so far, but I'm working on a harness that eschews autonomy and instead aims to work as a pair programming partner. Key to that are distinct "driver" and "navigator" modes, with the capacity to flip between them rapidly.

https://gitlab.com/philbooth/opair

(not really usable yet, but after tomorrow's session I expect to be developing opair in opair, which is mildly exciting)

by philbo

7/3/2026 at 8:32:23 AM

Love this idea and will be following closely! I've wanted a pair programmer style interaction for a while now. Something closer to VSCode's Copilot inline conversations and FIM, but where it's continuously watching what I'm doing and ruminating on suggestions.

by jesse_ash

7/3/2026 at 8:07:15 AM

Yes, like many others I've been experimenting a lot. What I've got so far is a harness-of-harnesses - ie, a harness which sits on top of Claude Code, Codex or OpenCode. I still use Claude Code or Codex directly for the initial planning of features, to investigate issues, and for small fixes, but whenever there's something even just a bit complex to do, I use my second-level harness.

Summarizing it a lot, what it does is:

* help you make better plans

* split plans into iterations, in a module-aware way for projects which have strict modularity (for now I'm doing this specifically with TypeScript and dependency cruiser) - this helps a lot when a project becomes complex

* ask an agent to implement an iteration, and then programmatically run a lot of checks after each iteration - not just regression tests, but also checks against project principles and conventions

* when possible, automatically fix deviations; when not possible, raise them to myself for an end-of-plan review

In this way, instead of having to constantly be engaged with the chat interface, with all the shorter or longer wait times which break my flow, I spend a lot of highly focused time during initial planning and final review. A plan implementation can go on for hours, and the various anchoring mechanisms added to the tool keep drift to a minimum.

At some point I'm planning to release this tool as open source. As this is the result of months of trial and errors, dogfooding, and vibecoding on the tool itself, the codebase is chaotic and the UI is still full of experiments I mostly basically abandoned, and I'm not used to releasing stuff in this status. But perhaps, in this brave new world, I should just do it and see what happens?

by danmaz74

7/3/2026 at 8:38:59 PM

I think this is a really good insight. It's definitely the back and forth in Claude Code or whatever harness you use that breaks flow and leads to frustration. This seems to be what Fable is solving since it is capable of doing much larger chunks of work without needing input constantly. Having a much more involved planning session than just turning on plan mode and then handing that off to be implemented is a much better approach. It's sort of what I've arrived at by accident where I often start planning changes in the web app and spend a long time going into detail about my vision and then working through the details without having Claude write any code other than perhaps a few small exploratory artifacts to visualize certain things and then taking all of that and moving over to Claude Code for implementation.

by pigpop

7/3/2026 at 8:53:04 PM

I've read several comments in the last few months citing this kind of approach. What I'm trying to do is to make the implementation phase more reliable.

By the way, if you're not doing that yet, something that can really help when doing UI/UX work is to have the agent create some mockups, and then tests based on those - I'm using Cucumber with some extra sauce for this. It's a very nice way to guide the agent in a falsifiable way.

by danmaz74

7/3/2026 at 8:58:47 PM

I have experimented with that a bit but not in a rigorous way, it's good to know that there is value in doing it so I'll try to integrate it into my process. Thanks for the tip!

by pigpop

7/3/2026 at 7:52:23 PM

what is of really good use is a super factory. The super factory drives the factory which builds the harness of harnesses.

by cuttothechase

7/4/2026 at 9:59:42 AM

the factory must grow

by chickensong

7/4/2026 at 10:00:44 AM

is this a mindustry reference haha

by kank0de

7/3/2026 at 8:50:06 PM

Wait until you hear about the giga-factories...

But seriously, if you care, this is just like using an existing library to do the lower level development work which I think is already pretty well done by the existing agents. It's not a design decision.

by danmaz74

7/3/2026 at 8:06:55 PM

Ha I’m doing the same. 2 months in or so, lost track of time a bit. Started with running all the usual suspects pi, claude code, codex, goose, but mostly migrated to own agent (ha!) for greater level of control. Also doing it in python for C lang as first target (ha!). Anyhow post a link if you’d release yours I’d be interested to take a look at adjacent work.

by kosolam

7/3/2026 at 8:46:45 PM

Interesting. From time to time I'm thinking about completely replacing the agent layer with my own, so that I would be able to enforce some rules at a lower level, but I'm wondering if it's worth it. What's been your experience with that?

by danmaz74

7/3/2026 at 9:18:34 PM

I never looked back. Full control, and it’s really basic stuff you can have in an hour or so. And then you stop fighting with the external dependency and can move fast with your ideas.

by kosolam

7/3/2026 at 9:32:49 PM

Venkatesh Rao had this idea about matching the type of work you choose at a given time to the mood state you are currently in (or finding cues to shift yourself into the mood state that best matches the kind of work you need to be doing).

Pick a few personas to develop that match different aspects of your personality and map to different mood states you might be in. Not so much defining roles, but different work styles that you work in. Write prompts for each of them.

Keep a task list your agent has access to. Feed it your personas as well.

Have an agent pick one task and to frame the work through the lens of each of your personas and to ask it to ask you to choose which one you would like to pick.

Schedule it to run each morning and trust your gut with whichever one you pick.

You have now put yourself on the opposite end of the prompt response loop and the agent is prompting you for a response.

You'll be in a flow state in no time.

by redmattred

7/3/2026 at 7:11:43 AM

One of the things I've been talking about with my senior developers is how the bottleneck has shifted even more dramatically to human code understanding vs code generation. AI is still not suitable for generating production grade code without a human checking it (yet), but it can produce a huge amount of code for humans to check. We've been experimenting with ai finding better ways of communicating what is in a change at different abstraction levels etc by always generating diagrams showing what it did etc, with the concept being that anything that can speed up human understanding of changes addresses the core bottleneck of the whole process.

by kybernetikos

7/3/2026 at 6:27:12 AM

I think the value right now is to focus less on external orchestration if at all. trust the (current best) model to do it better than anything you bolt on to the harness. focus your energy on providing clearer specs. I think the optimal spec is a disambiguated (through liberal use of the AskUserQuestion tool) 1 intent, 2, input/output contracts 3 constraints and 4 preconditions. focus on that and get out of the models way. I think of it like this, imagine a person who was not as smart as you was trying to tell you how to do a task. would you want more verbosity and step by step instructions or would you want them to just cut to the chase (ie, what are you trying to do, what are the obstacles, I'll let you know if I have questions).

also let the model verify itself. don't give it an objective that is vague, give it clear exit criterias for goals and let it loop until it gets there so much of the orchestration scaffolding seems like massive technical debt

oddly, I do the opposite of a lot of conventional advice when it comes to models. I use no memory, I think there is something similar to context rot when everything is stored. I like creating markdown files as memory that the model can grep if needed. I also havent found a real use for hooks yet, I have tried but they always seem to get in the way. skills on the other hand are very undervalued. they are so much more powerful than many realize. I used to think agents were where the power was. I think its actually skills. agents are really for context preservation. skills are what increase capabilities

I'm not even talking about quantity of items in memory, I mean dilution of intent. I really love a model with a clean slate and only the items it needs. I fear the memory guides the model in areas that might not be what I want with the current prompt

progressive disclosure is a big one. you can make context available but it is only loaded when needed. like lazy loading for prompt engineering. skills are to be used to instruct the model how to do something specific that is not in its training data. like how to access my proprietary system, how to interface with a custom program. you can embed templates in skills, you can embed code that executes in skills and only the output is loaded into context. skills expand capabilities, agents constrain context

(constraining context is a very good thing btw, don't mean to infer that agents are somehow inferior to skills)

by Jimmc414

7/3/2026 at 6:54:46 AM

It feels like everyone and their grandma is building an agent orchestrator at the moment, but I'm not hearing a lot of success stories. The fact that Anthropic and OpenAI haven't laid off all their software engineers already is probably a sign that orchestration breaks down somewhere. I suspect it's just a more elaborate way of burning tokens. I'm still interested in experimenting though.

by cedws

7/4/2026 at 12:29:47 AM

What I've been doing recently:

1. Vibe code a codebase that does what I want at a high level.

2. Iterate with LLM on this codebase to add features, fix bugs, improve performance, and address issues until it basically does what I want, but the code behind it is often a toxic waste dump.

3. Take lessons learned from vibe coded version and implement by hand. For challenging areas (writing a complicated algorithm) that would require a lot of thought or brainpower, I'll sometimes ask the LLM for a reference implementation and then modify it to suit.

This is a big speedup on manual code because you've figured out all the question-marks ahead of time and have a functioning blueprint you can refer to.

It's not as fast as having LLMs do all the code of course, but I find it to be a considerable improvement over doing everything by hand, while still letting me write code I'm comfortable with and understand deeply.

The other angle is to be very specific with the prompts and then dive deep into the code output and keep asking the LLM to change the code structure in various ways to ensure you get code you like. I found that to be frustrating and painful. Maybe in the future I'll write some really good prompts and future models will be better at following direction, but I haven't been happy with the results of that approach to date.

by missingcolours

7/3/2026 at 7:35:38 AM

Well, it always depends on your environment. In my case, nothing forces me to heavily use AI, so my workflow is kind of the old way, but with less hassle.

- Do your thinking alone. (AI part: search, understanding)

- Specing. (AI part: search, understanding, completing some text)

- Coding like the old days. (AI part: search, understanding, code examples)

- Okay, now I have a good idea of how my feature is going to work

- Look for fluff code and delegate it to AI to write/review it.

- Focus on the part of the code I want to have fun doing.

- Review.

- Repeat.

It’s slower than the approach of doing specs and letting AI do the rest, while focusing your role only on code review. However, I’m more in control of what I build, I can explain what I built better than everyone else, and I build up my knowledge. (also I have less problems, because less code haha)

Will I go for the full Agentic way ? Maybe but I will find a way to slow it down so I can be in control

by magicmadrid00

7/3/2026 at 8:15:01 PM

I like this, and it mirrors my experience.

I felt that, by using the "full agentic way" I am implicitly accepting the fact that all the knowledge I have right now is all the knowledge I will ever need or want to have (with the exception of new knowledge on how to ask AI to do things, I guess).

This seems like a nice way to enable yourself with AI, but not replace your brain completely.

by jwardbond

7/3/2026 at 9:17:39 PM

Awesome, the first comment that agrees with me, haha.

Yes, this is slow, but still fast compared to the old ways. It was liberating for me because I’m really enjoying this AI era again, while also improving.

The time I have won, I’m investing in reading more complex books about CS, discovering new engineering feats, etc.

Regarding the fully agentic way, I think the learning curve to get a system like that working is minimal, so there’s no need to spend a lot of time learning it.

It’s better to invest that time elsewhere.

by magicmadrid00

7/3/2026 at 8:57:11 PM

I've had the same experience and it has really put me off working on personal projects using AI quite a few times, though I keep coming back. My recent experience with Fable and the latest Sonnet have actually been very positive though. They seem to be capable of working for much longer stretches without constantly stopping and requiring further prompting to finish up large features. The place where I feel flow the most now is when I'm planning large feature sets and I use the Claude web app as a sounding board while doing this and prompt it to not write any code but focus only on asking questions and clarifying aspects that I haven't fully thought out. This results in a very detailed plan for implementation which seems to work very well with Fable or even Sonnet 5. I can then actually leave my computer and go do other things without the constant nagging feeling that I need to check if its stopped and needs nudging. After its done I review the changes and do QA and testing and then formulate a plan for the next set of tasks. It feels like this problem is getting solved, which is wonderful.

by pigpop

7/3/2026 at 10:16:51 PM

I got really annoyed at how slow the LLMs are so I found myself doing more and more prompts like "in file X do Y", if you can piecemeal your task into very limited prompts and periodically reset the context on each go the LLMs do the work MUCH faster. Since I reset the context all the time I often do manual changes in-between prompts.

But at the same time if you do like this you can't do that insane multitasking I see a lot of devs doing where they juggle multiple agents doing separate tasks (maybe even completely separate tasks on different git branches). I _really_ hate working like this and only do it if I know on of my prompts will take 10+ min. Usually an initial prompt for a large task where I will move to my usual 1-file-per-prompt style later.

Of note that I am working on a ~10 year old codebase with a lot of custom instructions for agents and a lot of code to dig through to get things done. I feel a lot of people are conflating using LLMs to start hobby projects and extrapolating the workflow to real world large codebases.

by DanielHB

7/3/2026 at 11:10:18 AM

I feel that flow state[1] is possible as long as you don't feel distracted into doing other things and you're needed to guide the LLM along every few seconds/minutes (someone mentioned a pair-programming type tool in this thread). For me that works if you have a good spec + workflow tool (assuming you're doing interactive coding and not kicking off long running coding jobs). I feel that a good test of a workflow tool is that it should offload all bookkeeping from you, leaving you to just read the generated code and think about design/architecture.

I built one such tool for myself: https://www.shipsmooth.net. You can use it to spec/plan out a piece of work, and then easily keep updating the spec/plan as you churn through its implementation. The tool assumes that you will pretty much end up changing the spec/plan during implementation, based on how it's going. In general, I don't see how it's possible to one-shot high quality code for custom use cases.

[1] Going by the definition of flow state here: https://en.wikipedia.org/wiki/Flow_(psychology): "fully immersed in a feeling of energized focus, full involvement, and enjoyment in the process of the activity. In essence, flow is characterized by the complete absorption in what one does, and a resulting transformation in one's sense of time."

by pramodbiligiri

7/3/2026 at 7:12:36 AM

I have a custom harness that runs in a macOS VM. It has e-mail and its own accounts. I assign it tasks in Linear, it does them and spins up PRs for me to review. This works pretty well, generally. I have to spend time writing stories and doing code review, but I don’t have to follow its (their — I have 3 of them) every move.

by Bogdanp

7/3/2026 at 7:32:13 AM

I created a small PI extension that always watches relevant directories and answers me in place, without switching context, or using a chat interface. Still experimenting but I like it.

https://github.com/piqoni/pi-piqo

by lexoj

7/3/2026 at 7:18:42 AM

If you like videos, I saw an interesting video yesterday about systems thinking, software as ecosystem particularly with AI. More of an overview but gives an insight into seeing where we might be able to experiment with different ways Its more focused on teams and companies than individual developers but I think it could be applied to the single dev.

"Software engineering at the tipping point" https://www.youtube.com/watch?v=2n41YjR5QfU

by thinkingemote

7/4/2026 at 5:05:49 PM

I'm using agent in GNU Emacs to debug the same instance of Emacs/agent. I also extend the agent by the same agent while Emacs is running.

by jcubic

7/3/2026 at 1:13:42 PM

That's how I code nowadays:

1. Start a session.

2. Grill my requirements.

3. Write an ADR, then either start implementing or separate into pieces.

4. Review the code on pyor.review, compared to Github, Pyor allows me to categorize the files and changes then review the important stuff and skim the noise it identifies.

5. Since I can do local reviews with Pyor, I can do that with Claude and feed back my comments to be addressed without it going to Github first.

6. Create a PR then merge it.

by othmanosx

7/3/2026 at 11:02:38 PM

Based on your comment, I assume you use Matt Pocock’s skills, is that correct?

If so, have you compared it to obra’s superpowers?

by gessha

7/4/2026 at 5:04:11 AM

YES! and I already love them. haven't heard of obra’s superpowers, will check them out. Thanks.

by othmanosx

7/3/2026 at 7:38:13 AM

Just want to add:

I'm trying to do the same amount of work faster, not do work in parallel or agent orchestration. I'm not against letting the model go off and do things on it's own, that has its time and place.

But if I can do something in 15 minutes instead of 1 hour without the annoying prompt response loop, without the feeling that there could be blind spots, and while keeping all of the context (or at least most) in my head. That's a bigger win than spinning up 5 agents to do different things.

by yehiaabdelm

7/3/2026 at 10:07:16 PM

Try this prompt: While working on the main task, launch a parallel sub-agent with the task context so far. The sub agent should think of high quality questions and put them to the user using a dialogue tool like zenity. Customize the inputs to the question, taking full advantage of the dialogue tools features to create a progressive interactive user experience. Ask only a few questions per turn so that you can adapt the questions to the answers.

This will keep you busy while the main agent runs. Customize it further to integrate the sub-agent answers to the main thread.

by irthomasthomas

7/3/2026 at 9:01:19 AM

The tab model was a lot of fun, you felt like you're getting a speed boost while coding. I think vibe coding (or agentic engineering) is a different paradigm altogether.

I have tried out some of the popular tools and I'm using opencode on desktop and I use pi via termux on android for when I'm on the go. I think the current direction of PRD -> review -> execute -> debug is in many cases the right mindset.

Working with a team of fresh graduates, I see that working with any vibe coding tool is like being a manager, not a developer. I think that's what you miss, you miss being a developer but the vibe coding tools make you a manager which isn't something that you might enjoy.

Nonetheless, I do think that there are some interesting things to do with pi. I'm just getting started, if anyone has an interesting workflow in pi, I would be interested in trying it out!

by notahan

7/3/2026 at 9:43:05 AM

Nice insight. I never understood this “accept being a manager” thing, because in reality, that’s not the case. Being a manager is much more complex than handling code generated by agents.

What still puzzles me a lot is how you can accept that AI just writes the code for you, without you being the one making decisions about how the code — not the spec — will be written. How are you able to get things done while still keeping a good understanding of everything you did?

Maybe I’m wrong, or maybe it’s because I haven’t pushed the agentic way to its limits, but I really haven’t found it to be a good way to produce good work in general.

by magicmadrid00

7/3/2026 at 8:13:51 PM

I think trying to iterate on a "spectral decomposition of your intent" - slowly working on increasinly refined breakdowns of what are the different aspects of your project are - both on the domain- and also the technical level; aka requirements and architecture. And then don't directly iterate on the code but rather regenerate/update the codebase based on the new intent and the old codebase... And a decomposition of the whole thing in terms of optics (open lenses, etc) where the decomposition respects the "spectral decomposition".

by Garlef

7/3/2026 at 7:47:23 PM

Graph based code generation where code doesn’t reside in files in the typical sense. On insertion, modification, and deletion, constraints are checked / ran to see if the change is valid and can be done or not.

by _boffin_

7/3/2026 at 7:35:43 AM

Something I'm thinking about and doing a bit of experimentation with is using LLMs to write specialist higher level code.

Rather than ask them to write web-apps in webby languages with open source frameworks etc, providing a very fixed, on-rails development process where everything is abstracted away. Accept that it'll be less powerful, but take the trade-off that it'll hopefully be faster and produce much more controllable software.

Concrete example, why do we let the LLM choose a database, schema, migration procedure, library, etc. We could decide to only support one database, enforce schema design (such as every table containing access control), enforce a migration process, enforce a library, even do schema design in a fixed config file rather than arbitrary DDL. Same for auth, deployments, even UI.

by danpalmer

7/3/2026 at 7:59:01 AM

This sounds a bit like Ruby on Rails including Hotwire? Even has the “on-rails development” in the name, schema design in a config, migrations, etc.

Though some frontend decisions are a bit more open

by villaaston1

7/3/2026 at 8:38:48 AM

Rails takes this approach[1] to some extent, but it's still possible to plug in different things, it's still all flexible enough to build any type of web app pretty much, and it still requires a ton of developer (or LLM) involvement.

I'm thinking of going far further, to the point that perhaps we should use a new language designed only for web-app development. I'm thinking about removing almost all options so that the LLM only gets to write custom business logic and data modelling and doesn't need to do much else. Again this is all at the cost of being more generally applicable, but I see a lot of software that is fundamentally CRUDL and it's still hard to build well, and I also see a lot of LLMs reinventing the wheel but implementing too many sides on that wheel. They need guardrails.

[1]: https://dhh.dk/2012/rails-is-omakase.html

by danpalmer

7/3/2026 at 10:37:57 AM

[dead]

by blaqq2

7/3/2026 at 10:12:12 PM

Looks like my approach is as old school as it gets. For complex piece of work, the only way to be in flow for me is when I'm driving the engagement. I start with a high level requirement and a high level design plan to achieve it. I also provide constraints that needs to be satisfied (efficiency, performance, cost, scale etc) and write it all into a markdown document and ask LLMs to review it, find blindspots and refine it until I can get a detailed design for that phase.

Then I pass that to another LLM provider to review and check for any bloat that can be cut or blindspots that need to be addressed.

Finally I get a test plan to help me test different components directly and in debug mode. I then ask LLMs to implement in stages where I can test them in small components as possible.

I think I end up spending more time (easily 2x) than hand rolling. But the upside is the design is more thought out compared to hand rolled code. It has fewer accidental complexities and I have a clear mental model of the entire design that can also be shared to others through the document

by wanderingmind

7/3/2026 at 8:22:02 PM

We're working on a browser-harness that makes forking, rpcs, and mapreduce first class tool calling primitives. Among other things, this makes it easier to manage your own context, because you can visualize your agents, subagents, and active work and resources as they interact with each other across locally and remote environments. And it eliminates all the complexity of mcp and local sandboxing because that is literally the problem browsers were made to solve!

To be clear the browser IS the harness, it's not just a browser-based UI but also the sandbox and orchestration layer. By giving LLMs deep browser access (through CDP and some special hooks) they can verify their own UIs immediately after writing them, navigate the web natively, and run commands that directly manipulate the active DOM. This creates a very tight feedback loop for UI work, but also let's you create or run browser automations, or query a site by running a javascript query on its contents, or a web page without deploying or uploading it anywhere, which is pretty powerful. What I really like is that this makes it easy to dispatch cheap models to generate and verify tons of little visualizations using svg.

Locally it's just a browser, but to manage remote instances you can either access them as tabs on any local browser, or as inline collapsible iframes. I'm trying to be cautious with the security side of it so we're not marketing it as a product yet, but would love to work with some anybody who is interested and does a lot of UI or cloud work!

I'm excited about this particular moment in tech because I think work is going to end up looking like playing Starcraft with data and AI, surrounded by rich custom media as you work, which feels really futuristic to me!

by weitendorf

7/3/2026 at 10:16:54 PM

I have gotten a lot of mileage out of giving LLMs narrow focus over the same plan or code change with different objectives. Write a plan, ask the agent to review the plan and consider where code can be consolidated. Consider downstream effects. Consider security risks, consider optimization, consider architectural concerns, etc. Then I generate the code and go through a similar loop. Then I read code and do more loops to clear out any problems or investigate things I'm unclear about.

In this way I spend most of my time building understanding of existing code and understanding the impact of my changes. My company is heavy into AI use and I find I am pushing out more code and much cleaner code than most. The gaps that appear during review are usually product understanding gaps and not code failures, and my LLM spend is somehow less then most.

I find this iterative process is much more inline with building flow than spending 3 hours writing a spec and wait a half hour for it to build a monolith PR.

by lubujackson

7/3/2026 at 11:13:33 PM

2 to 5 tabs in warp. Kinda want to figure out how to properly use iterm+tmux to have the boris cherny experience. already used both but there were issues with tmux with either scroll back, or copy paste or other similar things that get broken, even after mesing with settings.

i use git worktrees in different tabs as needed.

i have git push hooks that audit the code diffs for security issues by 2+ frontier models For code quality with a FAIL/CLOSED condition where both have to give the OK.

i have to do a pass and ask it to shorten the code, remove unnecessary comments and excessive exception gathering, etc. Generally cuts the code by half. The process is repeatable.

i just use claude code or codex with minimal plugins (HUD, frontend design). I would do even more if I had 10x or 100x the tokens and/or token/s available. I spend a lot of time waiting on 5.5 or Fable 5 to work, even when multi-tasking.

I spend the downtime writing detailed follow up or unrelated prompts.

by gozucito

7/3/2026 at 11:50:07 PM

Started using https://herdr.dev/ and it's fantastic as a replacement for `tmux` with _much_ better mouse support.

by Tadpole9181

7/3/2026 at 4:33:58 PM

Orchestration works very well for me, but not in the way most people seem to be pushing for, with middlemen scoring and routing every request. For coding, the routing is mostly solved at the config level. The harness lets you pin models per role, and that covers most of what a per request router promises.

On your actual question though, I think the loop you're describing does break the flow and gets very frustrating, but it's been a long time since I've experienced this.

Three things happened to me in the past few months: I've become cost conscious, I wanted to get more done faster, and I wanted to be able to do a lot more at the same time (in parallel). With that I developed my own workflow that works well for me. It's a config-led setup routed by tiers: cheap fast models for mechanical work (lookups, log reads), mid models for implementing against written specs, strong models for judgement and review. It's config on top of a standard harness, nothing exotic.

For me...my flow state has moved from tackling code line by line to traversing the layers of the entire system design in my mind, and being able to clearly articulate this to a strong model.

by jarodrh

7/3/2026 at 7:27:07 PM

Can you explain more how you 'pin models per role' 'at the config level'? And what a prompt looks like that uses that?

by OJFord

7/3/2026 at 7:40:56 PM

define roles for your agents, add roles to config that have specific models, use those models for those roles. Not the author but this is how I did mine.

by reactordev

7/3/2026 at 10:14:35 PM

I've built a bunch of projects with Claude Code, and the flow that works for me is planning as much as I can up front. Once I'm confident we've caught most of the requirements and the likely gotchas, I let CC run and just check back periodically to approve continuations until it's done and ready for me to test. One project at a time keeps me in a flow state.

A more exciting attempt the other day: I fed CC a PRD and ui-spec I'd drafted with Fable 5 (no reason, I just happened to be on my phone), told it to auto-approve commands, and let it fully build and test the project in Chrome overnight. I woke up to a mostly-working app ready for me to QA. Skipping the whole build cycle and going straight to watching it come together during testing was genuinely great.

The one thing I'm still figuring out is the trust side of fully-unattended runs like that. Curious how others are handling auto-approval when nobody's watching.

by chris_fullcycle

7/3/2026 at 9:58:32 PM

Yes. I built recently an agent that has very broad set of objectives and nothing in particular. I don't even know what it does most of the time but hopefully it will do something useful eventually.

You track its progress here https://github.com/relentlessworks

by _pdp_

7/3/2026 at 8:46:55 PM

I’m exploring agentic for creative coding / visual effects. Think touch designer but with a prompts graph. It compiles to native code (swift and metal). The first version is available here and I’ll be releasing a v2 soon that will be open source:

- https://sxp.studio/apps/subz

by tasoeur

7/3/2026 at 7:06:23 AM

I'm currently rolling out Matt Pocock's Sandcastle project so that I can have those brakes removed. What will be left is just the grilling(/wayfinding).

My current flow heavily relies on Matt Pocock's Skills and Sandcastle project. I find them highly valuable in practice: grilling(/wayfind) into a spec and extract issues. Those live in Linear projects. I'm pointing my Sandcastle set-up at such Linear projects (or loose issues), which results in an MR.

Currently at the point of self-improving the prompts and Sandcastle set-up with a retrospective pass of the logs.

by Bossie

7/4/2026 at 12:39:59 AM

I built an agent harness that doesn’t actually code. I’ve been wanting something that can teach me as I go. Enter codetutor.

https://github.com/jaketothepast/codetutor

It’s an eMacs package that starts from a spec and will help you iterate on it if you want, and works from a core set of docs about a project plus active specs.

It also will keep a treesitter based representation of your codebase to help you form the architecture. It has no write tools, it will read a diff of your code on save to help. It also can be prompted openly.

It’s a pair programmer, but the other way around versus traditional agent harnesses. You’re the coder, the AI watches

by jwindle47

7/3/2026 at 8:32:28 PM

IME LLMs are kind of like a projection of your current expertise - your prompting and guidance etc. biases LLM plans kind of 'in the direction' of your thinking. I think this is one reason why it seems like senior engineers get more lift vs. juniors.

What I am exploring is another step to the classic 'research / plan / implement' pattern: 'research / plan / LEARN / implement' where LEARN involves the human doing AI tutoring sessions to ensure a deep understanding the concepts etc. that the LLM is planning to implement so you can refine / iterate on plans and direct the LLM in ever more effective ways. My idea is that this then compounds your human capital and reduces the occurance of 'sounds smart, doesn't work' pattern.

by theodorewiles

7/4/2026 at 1:23:10 AM

Zed with multiple conversations open. Each conversation is a problem to solve. The agent is a way to read code and logs rapidly to find what's causing the problem. The feedback loop to the product is much faster.

I constantly check what the situation is versus what I want, then pull the exact location where the change needs to happen. I ask for some solutions, accept one or suggest a different one (most of the time), and repeat.

It's pretty fun, and you can make the product better by using it and understanding what you want—which is very difficult to know before the product exists.

by Frannky

7/3/2026 at 9:22:35 PM

I am using claude cli/tmux on a hetzner box and connecting to it via claude remote control. I have connected the box and my phone over telnet which allows me to view any UI work. Sometimes I do have to switch to my laptop for UI desktop layouts.

One gap which I kept running into on both mobile and desktop was refining the initial plan and then later refining the generated artifacts which involved lots of imprecise copy-paste. To scratch my own itch I built a review tool to improve the velocity of planning and refining generated artifacts. It has become my daily driver: https://github.com/livetemplate/prereview

by realrocker

7/3/2026 at 8:03:58 PM

I'm still in the prompt response loop of learning how to be more effective with it, but I've found that what works for me is to approach a project the same way I would if I was writing code by hand. I'll decompose the project into small discreet units of work and slowly build my way up, I find it makes less mistakes using that approach. I built a systems monitoring platform I had been wanting for a long time over the course of days instead of weeks or months, and I was really impressed with Claude's output.

Then I thought it would be fun to be able to monitor the status of all my workflows as buttons on my Stream Deck XL, and Claude was able to build the plugin with almost no issues at all. It's hilarious how much fun it is.

by felix-the-cat

7/3/2026 at 9:06:03 PM

I mostly use claude code on mobile and desktop with remote sessions and do most of my coding on the go now.

I have been tempted to distill something like GLM 5.2 into a smaller html + css only model for super fast interactive UI editing, because right now it's really annoying to do with large slow models. I'm sure it would be doable to do the same for individual language / frameworks, including potentially doing a final few steps on your own code base with some LoRAs that could be kept up to date to avoid having the model have to explore the code base each time.

Doing UI work with composer 2.5 and live reload is a way different experience than slogging through it with opus 4.8

by m_ke

7/3/2026 at 12:20:59 PM

My flow state with AI is having 5 different conversations at the same time making good progress on all of them by giving key insight and feedback at the right times.

You can actually go super fast with the right setup and focusing only on the important details like ensuring the shape of the APIs make sense and that test quality is good.

by eddd-ddde

7/3/2026 at 7:54:20 AM

I'm in the middle of it so don't have any conclusions for you, but I started mucking with building my own cli coding app and there are _tons_ of levers available that aren't apparent from claude code or codex.

Including altering the turn concept. I think it is still ultimately call and response but instead of everything is a quarter note you can get a little closer to a beat you like.

by pjbeam

7/3/2026 at 7:59:33 AM

It's love to have an interface where I could have several conversations with an LLM, with common context, but each separate and anchored to a different place in the code

by mbork_pl

7/3/2026 at 8:22:14 AM

Same here. Every time I come up with a complicated way to personalize my workflow I end up finding out there's already a better way to do it.

The only thing that I consistently do is create a simple html dashboard with a to-do list I can guide claude code with while rendering progress somewhat graphically. I love the levers but it's kinda the opposite of the flow in question.

by aitor1717

7/3/2026 at 7:22:22 AM

I am currently in the process of launching my AI teams platform that I've been working on since at least January. It's https://PersonaStack.ai. I'm doing it without VC money and all by myself. I've used over 110B tokens so far building it.

You get some amazing results with teams of AIs if you do it right. The key is to control behavior with what integrations and responsibilities each agent has. That way they naturally adapt, delegate, fact check each other, and generally act more autonomously.

This is already running the automated news site ainews.personastack.ai complete with social media posts 100% automated.

It also runs the issue triage, coding, reviews, and releases for the Kuberhealthy open source CNCF project, which is another thing of mine.

I don't think the next step is really smarter models. It's how we make the models more effective, and teams, when done right, net the best results I've seen.

Hoping to get noticed here soon, but it's extremely hard to do solo I'm finding.

by integrii

7/3/2026 at 7:28:54 AM

Very impressive, especially since you did it solo. The website looks great and explains everything in detail.

Can you elaborate more about its development? How much do 110B tokens equate to in $$$? What LLM did you prefer most during development? Any suggestions for other solo developers trying to launch their LLM-built product?

by fraXis

7/3/2026 at 7:25:23 AM

Doesn't this go directly against what the author is asking about? You're much less likely to enter flow state if you have a team of AI agents which are supposed to be autonomous.

by dnikolovv

7/3/2026 at 7:28:21 AM

Maybe project managers will finally get to experience flow states?

by sigmoid10

7/3/2026 at 8:04:25 AM

Just yesterday I tried to find an annoying and persistent bug in the cummunication between a Lyrion Media Server and my player. I used Opencode's native Big Pickle AI, and first it was a pain in the back, because it gave me a new code, I had to start the player and test the control in the server's web GUI, report the errors back, and so forth, and it tried a lot, but never found the real cause.

Then I got tired, and told it to use PlayWright to control the browser and test by itself. After some hangs, that I had to stop manually, it did all by itself, and finally fixed the bug. I had to increase the agents' steps setting in the config, but that was it. While it was fixing the bug, I surfed the web, and kept an eye on it, but it did everything on it's own. impressive.

by karlkloss

7/3/2026 at 8:47:30 AM

I have also noticed that, waiting for an LLM answer makes my mind wander to completely unrelated topics.

What I've found useful is to create a tasks.md file where each bullet point / task is one implementation. Bullet points that belong together and can be done in the same chat session are grouped together.

I easily enter a flow state during writing these detailed implementation plans. Then I can also start multiple chat sessions for parts that don't interfere with each other, while I'm waiting for an LLM answer for one part I can get started on the next or start reviewing one of the previous answers.

I have also explored more complex, e.g. using Kanban board for tasks, but I found great value in these simple yet effective setups.

by bryanhogan

7/4/2026 at 2:06:31 PM

The following paragraph is extremely emotional.

I think the whole industry is heading in the wrong direction. And it really frustrates me that most LLM coding agents are based on excessive delegation (or at least promoting it), which for me, is what causes the difficulty in entering the flow state. (I think the slow generation speed of SOTA frontier model labs also contributes to it, 30 seconds to generate something simple is a lot of time, but that's a different discussion)

We must kill excessive delegation. Our LLM coding tools should be built incapable of performing it.

Imagine if you have a coding agent where if you tell it "create a simple Todo app" it will fail telling you "simple Todo app is undefined", requiring to provide a more comprehensive descriptive prompt of what you want, then, this longer prompt becomes where your flow state functions, you are describing your edits and code in a more fluid way but still in text, and your focus becomes on this description/specification that you'll feed to the LLM.

by amdivia

7/3/2026 at 7:31:51 AM

I've been working on inverting the control theory for the agent loop. Instead of the user initiating everything, the agent runs automatically in the background and calls the user for feedback as part of tool use. The end game for me is to get rid of the chat interface altogether and move back toward async email and other messaging channels. The chatbot UI as a means of driving the business always felt like a temporary stepping stone / clever demo.

I think there are 10-100x productivity gains lurking in here. It is very expensive for a human to reserialize their mental state into a prompt each time a task needs working on. An agent can do this ~instantly and with high frequency 24/7. The higher the rate of evaluation the less change has to be dealt with between any two iterations. So, the likelihood that a given iteration needs human help goes down as you increase the rate of evaluation per unit of wall clock time. Tighter and faster control loops tend to require less severe corrective measures than slow and sloppy ones.

This is the most plausible reason for so many tokens in the future. I can actually see a million tokens per second making sense. I have a pretty good idea how I'd approach this if I actually had access to this kind of infrastructure. 1Mtok/s is baby tier in terms of raw information theory. The politics of employing a system like this are far more terrifying to me than any technological aspects. Humans really like having control over things, even when that control is pure downside for the business.

by bob1029

7/4/2026 at 12:59:36 AM

I like my setup, and don't feel productive without it, but I find it very complex to new users, and mostly only software engineers use it.

https://github.com/notque/vexjoy-agent

by AndyNemmity

7/3/2026 at 7:10:21 AM

Ive built a couple things in the past few months that have leaned heavily on LLM as my programmer. Mainly Claude code, but occasionally codex also. Its a different way to produce. I spend more time doing something like plain text feature mapping. simple .md files, good flow and creativity. Then once i'm happy with it, i pass it off to the dev team- claude to code up and integrate. I feel like im flowing in the part of the process I always was. But the buzz of getting something working is gone. More like slow satisfaction of getting something useful at the end.

by AJFlan

7/3/2026 at 10:23:47 PM

I have been using Claude code and cursor daily for the past 9 months. Here is what I learnt:

1. In my experience, well-articulated prompts are the most important part. You need to tell the model exactly what and how to do to avoid hallucinations. Especially in system design, write what the end result should be, how and let the model reason and look at the existing infra first, then plan the implementation. In my experience, there is little to no coding that needs to be done after model is done implementing. Make sure to let it implement in phases, with extensive tests.

2. Model choice. It is obvious, but Claude models are the current SOTA. In my experience, Opus 4.7 extra high is the perfect balance of speed and cost-efficiency. In my experience, OpenAI models were worse in system design, but faster and better at understanding the end result. Mostly used them to verify the bigger picture. Also used composer in Cursor. Was surprised how easy it was to do web design with it.

3. Long horizon tasks. Make models build plans. Very thorough plans - for a feature or a product. It is much more aligned with a written plan.

There are more details, but this is what I noticed so far myself.

by Losenok

7/3/2026 at 7:53:22 PM

i would investigate how claude code and codex work and suggest to build your own. it is not as hard to do as it seems (its not easy still, the prompting specifically). it can show u how workflows, skills, memory, plans etc. work so you can experiment for yourself to implement the workflow that suits _you_.

its an interesting excersize, for me i started with a simple repl to call models through model adapters, then allow them to list directories and read files within a chroot, build up slowly to also write access to files, then look at whats out there and try to build stuff you like from it.

the prompts are hard and there are some weird issues u will hit that will also help u understand certain fundamental limits etc. - understanding those can help also understand why some things dont work as hoped just yet.

for example, i had a real headache trying to make interactive specialized identities within workflows, so each stage is handled by specialized identites which have specific tools and focused context etc. theres a lot of hallucination too so u gotta have a lot more model cals, maybe do consensus between models etc. adversarial identities to review outputs before applying etc. All the stuff you still end up doing yourself again despite having programmed / prompted it all in...

initially it was all one context and identities struggled to remember what part of the process they would do, what tools they had vs what tool outputs to expect from previous stages etc. (it was funny but a big mess)

i use codex now, its closest to what i want, i couldnt get it better myself. claude wants to do too much and 'complete' stuff to much for me..

there are people blogging about loop programming, i did not investigate it thoroughly yet but id expect for myself id have similar results as my previous endevour.

edit: wanted to add, my motivation as claude dumps a lot of text back, i was using it back then. i wanted to give my models part of the screen as 'surface' to pin images, charts, and text etc on there, this worked nicely but i could not get them to do it really organically (prompting issues).

i thought i would be cool if the model could be like hey human, this thing we keep on screen while we discuss / design, like an architecture diagram. went to vulkan / glfw3 and rendering a terminal in there to get good enough pixel accurate graphics for presentation, that worked well and claude built it really easily.

by saidnooneever

7/3/2026 at 9:32:05 PM

[dead]

by Saidsadik2003

7/4/2026 at 12:45:28 AM

This is not my case. Today Claude code coded three features in my client saas just perfectly. Medium complex ones but perfectly from the plan, tests, linter and pr. A good CLAUDE.md is enough. Skills for auxiliary tools like sentry, grafana…

by drchaim

7/3/2026 at 11:31:04 AM

Just use both. vscode + copilot for autocomplete and antigravity for prompting. My main editor is vscode and I do heavy manual edits, and I also heavily use antigravity. I feel like I've become very productive and not actually renouncing the ways of the programmer, but augmenting them, as long as I'm careful enough with the generated code by checking every file it touched.

by madprops

7/3/2026 at 7:18:45 AM

I don't think you would expect to get into a flow state if you were intermittently directing another (human) programmer to do work, and you shouldn't expect to with LLM-driven coding either. Perhaps you are best finding out ways to extend the length of time where the LLM can work without prompting, then use that downtime to focus on other tasks that will help you to guide it better the next time you need to prompt it.

by chilmers

7/3/2026 at 7:46:58 AM

I feel the opposite. Creating a DTO or wiring up a CQRS command takes me out of the flow. And while I enjoy a good refactoring, it would be nice if I could just have it refactor code in the background while I'm still working in the same file.

by sixothree

7/4/2026 at 3:54:03 PM

llama.cpp + pi.dev for me, using small Gemma and qwen llms

by sometimelurker

7/3/2026 at 7:58:57 AM

A) spec driven development

B) opinionated skills that use GitHub tickets, merge gates and execution of ticket graphs

by snissn

7/3/2026 at 7:45:41 AM

The fundamental problem i keep seeing across all harnesses is the use of the exact same UX afforded by a git based backend. If we want to stay in flow, the LLMs edit backend would have to be based off something like crdts to handle simultaneous edits.

by neepoPhantom

7/3/2026 at 7:01:28 AM

I keep a TODO file where I just write my ideas in free text, and every once in a while I tell claude "I updated the TODO file".

This is basically like queueing up prompt.

I wish Claude Code had a thing like that builtin. Like a "user ideas scratchpad".

by hsn915

7/3/2026 at 7:58:37 AM

using tools like claude code and codex constantly boosts our dopamine, making hyperflow impossible. these days, most engineers work on multiple projects simultaneously to satisfy their dopamine receptors.

by doganarif

7/3/2026 at 10:41:30 PM

I'm in offensive security and use it to write exploit code for various projects I'm working on.

Too many people are using LLMs to shortcut knowledge completely. I have more work than ever fixing the security issues on vibe coded apps, and I don't think it will slow down any time soon.

by jazz3k

7/3/2026 at 8:01:26 AM

Computers are like a bicycle for the mind.

LLM AI is like Uber for the mind.

by dosisking

7/3/2026 at 8:10:17 PM

I recently started an internship in a field I am very interested in. I began using claude to write a lot of my code, but realized that:

a) It was way too easy to just auto-approve everything. Answering the 5-10 spec questions it asked me made me feel like I was an important part of the loop, but really it was just a way to make me feel important while spraying my slop cannon.

b) I wasn't actually learning anything, defeating the whole purpose of the internship I worked hard to get.

I am now using a workflow where the brainstorming process is the same, but I have claude write an instructional document for me to implement. It has instructions to ask me questions about what I know / want to know, to lay out the plan iteratively with lots of verification steps, and to heavily explain portions of the code that are unfamiliar to me. It's sorta like making my own custom tutorials specifically for the problem I am working on.

It's a little slower, but not too bad since it does still put whole codeblocks in the instructions. I have a much better understanding of what I am doing, I still get to enjoy learning and programming and improving, and I don't feel like a reverse centaur.

by jwardbond

7/3/2026 at 9:25:56 PM

As Boris said, you shouldn't be manually prompting anymore but asking the AI to prompt itself, in the form of workflows. I usually have 3 to 5 different sessions running at once all autonomously.

by satvikpendem

7/3/2026 at 10:50:27 PM

IMO asking the AI to prompt itself, remind it to clean up other agents, tell it to monitor something and just hang there for hours, etc gets old really fast.

If I had to pay the API rate to have one LLM rewrite what I just told it to another one, then have the main one get busy or start waiting for subagents rather than be something I actively steer, and come back to the subagent being either gone or left hanging for hours blocking another one from doing the thing I actually asked it to do, I would never do it through Claude Code. It costs me only a few seconds to ask it do something and I almost never hit my usage limits without them, so I basically only use them because they're free.

For my own bulk workloads I just put codex and my own harness in container and built an API dispatcher for the repeatable workloads I care about. You can just pull from a queue or click a button or run a script, or use LLMs to launch them or review them, but it doesn't make any sense to me to have them "monitor" or manage each other passively because you just end up doing it anyway without a real API to control it.

by weitendorf

7/3/2026 at 9:39:49 PM

Funny but how do you then maintain orientation context? Except you’ve set strict parameters which then gets feed into the loop and which it uses to make decisions?

by strapchay

7/3/2026 at 6:54:46 AM

Related question: are there any close-to-gpt5.5/opus-level good autocompletion models?

by egeozcan

7/3/2026 at 7:06:18 AM

when it comes to autocomplete, the harness matters more than the model

by byzantinegene

7/4/2026 at 4:01:18 AM

I do all my development as a longish running pipeline that mimics how I used to write software.

There is a high-level plan, it gets decomposed into steps that get performed sequentially. If during a step something occurs that challenges the original assumptions, the remaining steps can be reconfigured or the implementation stopped.

Well, I guess it's more rigorous than me, because it does a lot of quality checking along the way and if course, in the end.

This is the most boring approach to this work as you can imagine. But, after running this process hundreds of times and tuning it, today I can do about 10-20 PRs a day that are often quite good. I manually review and manually test and as models and my pipeline have improved the % of PRs that require zero human intervention after the initial planning is getting higher. Maybe 50%. For the other half, it's good, but there are often glaring bugs, especially for UI/UX stuff.

by vcryan

7/3/2026 at 10:13:18 PM

I spent 3 months on Pi and the Opencode Go plan for open model inference. I've never had so much fun on a computer. If you are looking for a place to start, that should be it. Or check out: https://github.com/huggingface/tau at https://twotimespi.dev

I've now use GPT-5.5 as the primary. Code quality was just higher. I tinker and do R&D with open models then come back and refactor my slop into usable code I can save for future use.

by imagetic

7/3/2026 at 7:49:46 AM

My current approach which I've been testing on two MVPs with what I would call 'moderate success' (but hey, actual success!)

3 tier, philosophy-spec-design. Increasing detail. Design files include db model explanations and pseudocode/function headers - that level of detail.

For each thing I need to change, I have a, prompt ready to go to ask the agent to follow about 5 steps and it outputs a 'reviewfile' with details of what it things about the thing I posited. I review its output. I have another prompt ready to then get an agent to generate a taskfile + update the design documentation. The taskfile explains in great detail what has changed and what needs to be implemented. I review the taskfile and got diffs of the design doc changes. Finally an agent implements the taskfile. I review all changed code and commit.

It gets there, but still definitely misses some stuff. It's very adequate for a MVP I'm finding.

Edit: this seems to only work with Opus. Sonnet can't do it (maybe I'm just lucky and Opus is seriously compensating for an awful approach and I'm just lucky?)

by Incipient

7/3/2026 at 8:46:03 PM

I saw another post today about a startup making an oven for baking bread. I feel that often the main issue lies with what you need 'the code' to accomplish.

My flow state is thinking about and understanding this: am I solving a problem that needs to be solved now, for the right person?

I created this to help me understand it (project foundations + create milestones) and then bring it to reality (ship milestones).

https://artrichards.github.io/agent-playbook-suite/blog/

by ArtRichards

7/3/2026 at 7:51:41 PM

I was thinking how it would be interesting to make an environment where instead of LLM just crapping out a bunch of code really fast, it works more like a pair programming exercise going at human speed.

The LLM would explain what it's doing, then write a bit of code, then you have time to look at it and understand it, and go to the next step. At any point you can interject and discuss or change it.

I find the biggest problem is that once an LLM generates a bunch of code, it's really hard for a human to build up the context for what the code is doing and why. When you're coding normally or pairing, then you're gradually absorbing the context and what the code is doing throughout the process.

The reality is that writing code fast was never the bottleneck. It's understanding the code and making sure it's actually doing what's needed that's hard.

by yogthos

7/4/2026 at 4:29:04 AM

I'm using LLMs as a search engine replacement. The coding is all me. Writing the code was never the bottleneck.

by classified

7/3/2026 at 7:25:28 AM

> but I haven't been able to enter flow state like I can when I hand write code.

Fixing that for you.

I haven't been able to enter flow state like I can when I write code.

by chrisjj

7/3/2026 at 8:09:02 AM

Prompts are a higher-level programming language.

by dude250711

7/3/2026 at 3:39:28 PM

Writing prompts is not writing code.

by chrisjj

7/3/2026 at 10:41:19 AM

Thanks to being unemployed, the last few months I've been experimenting a lot with coding agents, harnesses and most importantly, the workflows around them.

Currently I'm refining what I think works best for me, which I'd call something like "issues/PR based LLM workflow", powered mainly by this action I'm building on top of the Pi coding agent SDK: https://github.com/shaftoe/pi-coding-agent-action

Essentially I issue prompts swapping between the terminal and the git forge web app (GitHub and my own Forgejo instance) and it currently looks something like this:

- create an issue with detail/quality of spec based on how the task or the project is important

- trigger a Pi session prepending a comment in the forge with "/pi " to work on it, either to produce a report or to e.g. implement the change in a new PR

- trigger more sessions in the same thread, be it an issue or a PR, to steer or to add more requests like fork out a new PR or similar. This works also for reviews so I just add comments and the submit a review with "/pi follow the comments instructions" or similar

- if I want more fine graded control and I am at the workstation I use the bridging Pi extension to pick up the work locally: https://github.com/shaftoe/pi-coding-agent-action/tree/devel...

- rinse and repeat until I'm either happy with the change or the PR is so bloated that I get rid of it and start anew

I know it's probably something Claude / Codex / Cursor offer with their web app but I want the freedom and the flexibility to use the LL provider/model I want, and Pi as a harness does that plus all the rests egregiously. Another advantage is that I can fit the LLM action in any pipeline I want and take care of chores like automated changelog generation and what not.

As I said it's still mostly work in progress but in general I think there's lot of potential with this kind of workflow, it forces me to keep the scope of the changes small (I still want to review the PR content after all) and gives me a memory for free just leveraging the ticketing system. I also like the fact the harness is running most of the time in the ci/cd sandbox which, in the case of Forgejo, I control fully.

PS I try to keep my work with/on AI tools on my website at https://a.l3x.in/ai

by alexfortin

7/3/2026 at 10:19:20 PM

I have built a radically different system in general called Abject (https://abject.world). I have been thinking about how agents are the wrong abstraction and what an operating system can look like now that we have LLMs. It's not designed to help you with your website or app, but it does code and make different kinds of apps within it's system. In principle it should also be able to code apps outside it's ecosystem but I haven't tried that in any serious way.

by mempko

7/3/2026 at 10:16:50 PM

Stop searching for “different ways of coding”. This is it.

LLMs have barely been around for a few years. People are addicted, seriously addicted, to the next shiny workflow. It’s like JavaScript frameworks all over again.

The way to get over this addiction, is to just stop talking about it. Stop reading another BS article about how someone used agents to do some 10x improvement already. Unsubscribe from company channels where people endlessly bikeshed how to use some new LLM or agent harness or whatever.

By now, everything you need to know to maximize productivity has already been discovered. There are no new tricks, and even if there were, you’re really not missing out, the old tricks still work just fine. Just get out there and work.

by deadbabe

7/3/2026 at 9:39:41 PM

I assume you're not talking about solo work. For me, the quickest way to "flow state" has always started with human discussion. Only then does writing the code become a trivial task I can do alone. LLMs will never provide the flow because they do not set the goals. The tail does not wag the dog. They can only suggest implementation details.

Unless I'm working on totally unfamiliar problems, I don't want that advice for the majority of the code. Contrary to popular belief, there exist so many situations where there's exactly one right answer and countless wrong ones. There are less important miscellaneous parts I might have it fill in.

The only reason I cannot completely delegate to AI is because it cannot read my mind. Even then, it would probably still suggest crap since it is the averaging of those countless wrong answers. And still, even if it could overcome all that, I'm only saving a few hours at the end of several days of meetings.

I'm just not getting where the value is for anyone beyond entry level. I'm being totally honest when I say that I even stopped needing most search and documentation (for mature tools) over a decade ago. Back then, Stack Overflow was at its peak and I had the same questions about it. Offline coding is not only possible, but increasingly easier.

What am I doing different here?

by sublinear

7/3/2026 at 8:53:53 PM

Interesting

I have noticed this too, but have a different problem

The "flow state" has never been where I want to be, it is where I make my worst mistakes and where the details swamp the bigger picture. A "cannot see the wood for the trees" problem

I am developing a practice for agentic coding, involving plans, reviews and check points. But the "twirling my thumbs" waiting for the agent to do its thing is a related problem for me

by worik

7/3/2026 at 11:34:44 AM

1. Find a problem that LLMs suck and you're good at. Then you'll have no choice but to enter the flow state.

There's lots of those still. Portable shell programming is my favorite. Even the most capable models limp at it, but I thrive on my own, so it becomes an interaction where I really feel I need to think.

2. Work on dense programs, and use LLM for debugging only. LLMs suck at writing dense code. They thrive at redundancy and verbosity, so it will make you avoid it and use it for adjacent work, not the main thing.

3. Multitask. Ride several bikes at once, but not for the sake of doing more (for that you could automate), do it for the multitasking. Parallelize, split projects into multiple work fronts, work on reducing the time to mental switch between contexts. It's not coding per se, but a great skill, AI involved or not.

by gaigalas

7/3/2026 at 8:50:51 AM

I built a terminal that is also an agent comms system. Way back when Claude Code first came out I hacked together something to get two of them talking to each other and inserting text and reading from each others respective TTYs, and it was horribly hacky, so I set about actually understanding how CLIs, TUIs, and terminals work (I had written a simplified terminal based on jquery-terminal a long while ago during Covid that hacked tool runs using GPT-2, so this was overdue). I've been writing code for decades, so have all of it handy in case I need to point the LLMs to a particular way of doing things. Refactoring constantly is key.

There's cmux in this space, but I had already used Hyper for years, so I decided it was time to fork something and build on it. Cmux does tabs in panes AND panes in tabs. Hyperia does addressable panes in tabs and windows. I've tried to keep it minimalistic, which helps with flowing back and forth between different projects (I typically work on 3-4 at a time). I added a Rust sidecar, making all objects addressable over MCP, so Claude Code, Codex, or a small local model on Ollama can split panes, run commands, and read screens, with one hard rule enforced in the harness rather than the prompt: an agent can never move my focus, other than asking for permissions to access a new object. ACLs too. Hyperia also carries an agent loop that wires into it's own MCP server, so a local model in a Javascript "shell" can control resizing the terminal (handy for videos), or opening a project and setting up the agent panes.

I stay typing in a pane while the agents work in theirs, in my peripheral vision, and web panes sit right next to terminals so docs, webapps/sites and the agent chat live in the same window. Reviewing becomes glancing instead of context switching, which is the closest thing to ideal flow with prompts I've gotten out of this auto-AI stuff. Tab and pane clicks copy the address into the buffer, then I paste and issue commands referencing what I want dealt with. I have an SDR radio on my box that allows me to talk to a given pane (WIP not in the build yet). Working on getting the local agent stuff done and wired to the radio.

The upshot of this approach is enabling agents running in one tab, all mounting the same directory, with one in charge of the others. Claude Code is great at this, and it saves on the tokens it would normally use for doing it itself. I talk to Claude, or whatever I pick, and it talks to the rest of the agents and coordinates the work. I like Antigravity a lot because it moves crazy fast for coding. With Claude in control and GLM-5.2 doing auditing and explaining to me how development is going. As an example. No unseeable agent army here. No need for it, actually.

About the only thing that trips me up at the moment is having to work on Hyperia itself, which I don't do inside of it because of restarts. When I work on Hyperia, I start an agent in Windows terminal and wire it into the MCP for testing. I build installers constantly as well, and then run through the Q&A process by using it to work on other projects I'm doing.

I use Zed for code editing and viewing, but rarely. I also just open things in special sticky notes (or have the agent do it) so I understand how we're doing things. GLM-5.2 took to the planning stickys like a fish in water.

https://github.com/deepbluedynamics/hyperia

https://github.com/deepbluedynamics/nemesis8 (n8)

Both are open source, obviously. It's worth mentioning they will remain that way and will never require a service plan or any other cost. I built them because I needed it for another project I will be selling, not aimed at developers at all.

n8 implements the agent runs in containers. This is a separation of concerns - in runs in any terminal and controls the session starts and search for previous sessions (as well as monitoring the usage of tokens, CPU, network and file access). Working on the dashboard for that now, so I can easily see which files are changing, how much they changed, and what changed in them. I co-founded Loggly, so that crap is in my wheelhouse.

This isn't the tab completion model. It works great for the way my brain works, but I also think having an agentic terminal is a good move for anyone writing code and we'd all be better off if we ran agents in containers over our bare metal. It makes it way easier to see what the agent is doing (and resuming later), and allows it to do most of its work in the container, as opposed to running loose on my box..

by kordlessagain

7/3/2026 at 8:03:36 AM

I build a lot of my own tools, to suit exactly how I want to work. Obviously, having a little thinky guy in the computer to do most of the busy work of making new tools accelerates that, but tools that make the LLMs suit me also accelerates my general work.

Some stuff I've built:

https://github.com/swelljoe/tandem - Tandem is a sysadmin buddy that travels with you over ssh. Just a wrapper over tmux and claude code (or whatever agent you like), it opens two panes in tmux, one with an ssh session to one of the hundreds of devices I maintain, and one with a local Claude Code configured to use a local work space and instructed via CLAUDE.md/AGENTS.md to use tmux to interact with the remote machine. I built it because a lot of my coworkers were installing Claude Code on our robots and authenticating there to get help with robot troubles, and that felt bad. This allows them to keep all sensitive stuff locally and still get help troubleshooting directly on the device. I happen to find it useful, sometimes, too.

https://github.com/swelljoe/nelson - Nelson is a fancy Ralph loop for security bug hunting that I built to help audit my own software. It's also grown to include a benchmark suite I'm using to figure out which models are worth using for security work. I've published some of those benchmark results, and have a few hundred hours/dollars worth of new ones to publish this weekend. Turns out the benchmarking is more interesting, so that's gotten more attention than the bug-hunting side, but the benchmarks inform how the bug-hunting side works, and I added multi-model/multi-pass scans and de-dupe features recently because I found that letting models have a couple bites at the apple increases discovery, and there are bugs that only some models catch, and it's not always the top model that finds them. There's some overlap, but also some divergence. This research has also led me to start working on a harness for security auditing tasks; giving the agent tools and project structure data to lift detection and reduce false positives.

https://github.com/swelljoe/flar - FLAR is the Fast Light Agent Restrictor. It bubblewraps an agent so it is quite safe to use agents on your local machine, even with `--dangerously-skip-permissions` (which makes agents more fun to use). The sandbox feature found in most agents is porous and can be expanded by the agent harness itself. Similarly, if the agent introduces a supply chain attack into your code and runs it before you get a chance to audit/review it in a PR or run it through an SBOM dependency checker, the blast radius is exactly the project directory and the credentials/history of the one agent. (Whereas, without flar, the blast radius is your whole .ssh, github creds, all agent creds, your keyring, whatever secrets are in your home, etc.) This one is new. Just made it because I was talking about how I always put agents in VMs because I don't trust them. Someone suggested `srt` (https://github.com/anthropic-experimental/sandbox-runtime) and I like the idea but I don't like how complicated and huge and JavaScript it is. You can read and understand the entirety of `flar` in one sitting. Anyway, to break out of "prompt/response", you have to skip permissions, or call it via `claude -p` or API with tasks to perform. Nelson does the latter and `flar` does the former.

That's not to mention all the side projects and other stuff I've been able to make a lot of progress on.

The biggest one is finishing https://venturous.app/ (or, at least I made it do what I most wanted it to do, which is provide map overlays of US public lands and mobile data provider coverage so I can find cool places to camp free while staying connected). This is a re-implementation of an old defunct app called FreeRoam that I absolutely loved when I traveled full-time. I built half of it over several months by hand, and then Claude helped finish it in a few weekends and holidays. I'll get Claude to help build the mobile apps someday.

by SwellJoe

7/3/2026 at 6:46:09 AM

I’m writing a JSX templating language — to manage context, branching, etc automatically. You hand it a spec/existing work and it automatically applies a recipe.

So far that’s been much nicer for anything large or complex, because I was spending all my time on context piping.

by zmgsabst

7/3/2026 at 7:27:23 AM

You are the bottleneck.

Why should AI be limited to human time. Is a mountain? A galaxy?

by brador

7/4/2026 at 8:10:54 PM

[flagged]

by sanju3026

7/4/2026 at 3:11:00 PM

[flagged]

by germanptr

7/4/2026 at 5:43:57 PM

[flagged]

by ashali_0

7/4/2026 at 3:04:06 AM

[flagged]

by TimButterfield

7/4/2026 at 3:30:51 AM

[dead]

by StahlGuo

7/4/2026 at 5:29:53 PM

[dead]

by erikbethke

7/3/2026 at 10:03:29 PM

[flagged]

by eremes81

7/3/2026 at 7:30:05 AM

[flagged]

by nonbind

7/3/2026 at 6:39:49 AM

[flagged]

by cws_ai_buddy

7/3/2026 at 8:43:12 AM

[dead]

by SKYNET800

7/3/2026 at 8:57:50 AM

[flagged]

by reedycat

7/3/2026 at 7:22:59 AM

[dead]

by SKYNET800

7/4/2026 at 11:05:32 AM

[flagged]

by unjuno

7/4/2026 at 12:09:30 AM

[dead]

by efortis

7/3/2026 at 9:17:46 PM

[flagged]

by tomgow

7/3/2026 at 9:33:08 PM

[flagged]

by tomjow

7/3/2026 at 8:11:32 AM

I think the fundamental aspect of flow is that it requires a high amount of cognitive engagement. Most of the time you're just not getting that from interacting with an LLM because the process is relatively passive. There are also forced breaks while it does its internal CoT which breaks flow.

I think a lot of people get a sort of novelty effect when first interacting with an LLM which can feel superficially like flow, but it's different in that it eventually wanes and what really happens in practice is you're encouraged to disengage and this makes it almost impossible to get into a true flow state.

The risk here I think is that if you get humans disengaging from the task at hand, there's a higher chance of bugs being introduced. You might move slightly faster in the short term but be forced to hit the brakes in the medium/long term.

by captainbland