Zerostack – A Unix-inspired coding agent written in pure Rust

5/17/2026 at 1:56:00 AM

I (somewhat jokingly) wrote one recently too... https://github.com/pnegahdar/nano in under 200 lines. Repl, sessions, non-interactive, approvals, etc

The smarter the models get the less the harnesses matter (outside of devx).

Maybe one day I'll run it through swebech.

by parhamn

5/17/2026 at 2:15:57 AM

So freaking cool..in just 200 (190 actually) lines.

I also wrote one by myself last week (just for fun and learning). It works, including integration with configured mcpServers (like you do in most coding agents). Wrote about the whole step-by-step process and what is needed at what step and why: https://nb1t.sh/building-a-real-agent-step-by-step/

by freakynit

5/17/2026 at 3:05:48 PM

Ok, I know it's a joke. And also, are you daily-driving it?

by tasuki

5/17/2026 at 7:40:20 PM

Not daily driver, but have used it as a utility a few times.

For my daily work I like letting different harnesses compete and look over each others work (while subsidized with the subscriptions) so I use OpenADE.

by parhamn

5/17/2026 at 2:10:02 AM

I like it

by mgfist

5/17/2026 at 9:46:24 AM

I understand the need for memory footprint in some situations, but what's the point of seeking performance for a software that mostly calls LLMs and waits?

by rullopat

5/17/2026 at 10:43:41 AM

Before I tried coding agents my guess would have been: none.

But seeing how slow claude code and copilot cli are and how much ram they use I'm flabbergasted. If you have long running sessions they can both take tens pf gigabytes of ram and feel quite sluggish.

by tjoff

5/17/2026 at 11:12:00 AM

huh. my evidence with codex hasn’t been so bad. and tbh why would i discourage anyone from coding. hack away mr hacker. your solution will either sink or swim

by i_am_a_peasant

5/17/2026 at 11:57:20 AM

codex is in rust and not in power and memory hungry js/ts.

by krzyk

5/17/2026 at 1:15:09 PM

oh sweet I had no idea. funny that i mostly use it to write rust

by i_am_a_peasant

5/17/2026 at 6:43:11 PM

It was previously JS/TS, but they rewrote it in Rust, sometime in the past 12 months.

by dorian-graph

5/17/2026 at 6:23:23 PM

Check out its app-server, IMO it’s a decent foundation to the codex clients.

by manmal

5/17/2026 at 12:49:20 PM

I've been playing with running Claude Code inside a Vagrant VM. I can't be certain it was getting OOM killed when I allowed the VM 4GB of RAM, but when I went to 16 it did seem to be more stable...

by crabmusket

5/17/2026 at 1:50:46 PM

> I can't be certain it was getting OOM killed when I allowed the VM 4GB of RAM

Of it's actually getting OOMed (and not backing off by itself), I'm pretty sure that's logged in dmesg. Or earlyoom or systemd-oomd if userspace is in play and getting there first.

by yjftsjthsd-h

5/17/2026 at 9:48:08 PM

Thanks for the tip, I will probably try shrinking it back to 4 to see, as that seems like it should be enough RAM for anybody (:

by crabmusket

5/17/2026 at 12:28:34 PM

Yes...exactly. Its frustrating and inefficient.

by Mjarvis

5/17/2026 at 5:29:18 PM

The appetite for Rust is the appetite for higher guardrails. Automatic memory management in safe Rust makes it less likely your app bloats even as its source balloons.

The people "writing" agents are not themselves experts in how to write performant code. Claude Code is so massive and ugly it can only be realistically maintained by continuing to throw LLMs at it. But that's not a replacement for good software design.

by mpalmer

5/17/2026 at 12:17:15 PM

[dead]

by adabsurdo

5/17/2026 at 10:35:18 AM

I see spreading Rust as an overall good thing, because it changes benchmark on how software should feel in terms of performance, stability, memory footprint.

So even if it doesn't create tangible advantage in a particular use case - its still good for the whole industry.

by mapcars

5/17/2026 at 11:19:53 AM

I haven't used Rust extensively but my feeling is, if you change the design (which inevitably happens in many early stage projects), the refactoring takes more time due to borrow-checker semantics. Although I am far from a representative sample and could well have been using it wrong

by GodelNumbering

5/17/2026 at 12:03:41 PM

When you write Rust long enough you settle on certain architectures (message passing, event loops) that go well with the borrow checker, and don't end up thinking about it too much. Plus you can always throw an agent at the first set of errors from the refactor and let the compiler guide the annoying parts.

by ijustlovemath

5/17/2026 at 1:07:27 PM

> When you write Rust long enough you settle on certain architectures (message passing, event loops) that go well with the borrow checker

So basically Go?

by bheadmaster

5/17/2026 at 1:21:26 PM

Go only provides one concurrency paradigm. Rust support many (if not all).

The type system of Go is very weak. I'd say that'd be my main reason to pass on Go, even when the concurrency paradigm fits the project perfectly.

by flossly

5/17/2026 at 4:44:20 PM

The biggest reason to pass on Go right now (if your software can tolerate a runtime) is the lack of algebraic data types when doing interesting domain modeling. It makes such a huge difference it’s worth tolerating the pain points of Rust (or Swift, or F#) just to have them.

by jen20

5/17/2026 at 4:09:06 PM

Traits, Enums, and Typestate allow much richer paradigms at much lower cost

by ijustlovemath

5/17/2026 at 11:20:45 AM

Its just not a thing to consider and doesn't happen often.

by eldenring

5/17/2026 at 11:19:43 AM

No because it means people will use Rust for the wrong reasons.

Systems programming is only a tiny fraction of code out there.

Approaching every problem as a systems programming problem is a massive waste of resources and intellect.

by amelius

5/17/2026 at 11:29:10 AM

For small to medium projects, an LLM can write functional (if not well crafted) Rust.

Considering how easy this is now, why choose a heavier, slower and less typesafe language?

by angusturner

5/17/2026 at 2:09:55 PM

Ok, so write your app in the garbage collected language, and then tell the LLM to translate it to Rust :)

by amelius

5/17/2026 at 3:30:52 PM

I find it kind of shocking that Anthropic doesn't see it this way.

by Wowfunhappy

5/17/2026 at 7:30:23 PM

Claude Code has whole game engine built into it. God knows why.

by pojzon

5/18/2026 at 1:39:47 AM

Tell us more.

by attentive

5/17/2026 at 12:00:49 PM

Could choose a similar weight, similar speed, equal or more typesafe language though :)

by singpolyma3

5/17/2026 at 2:47:45 PM

Ada? Other than c and c++ everything else benchmarks 2-4 times slower than rust for compute bound tasks, even after jit warmup. I'm up for ada though, especially with an llm where I don't have to type all that verbose syntax.

by galangalalgol

5/17/2026 at 3:01:48 PM

OCaml? Haskell? Idris?

Lots of options with no jit or warmup

by singpolyma3

5/17/2026 at 3:44:37 PM

I'm not against jit or warmup, just saying it doesn't actually catch up for compute bound tasks in my experience. Haskell and ocaml would definitely be next on my list, but they do take a very good hit in performance over ada or rust. I wouldn't say they were similar in performance, certainly. There is a pretty big cliff between the systems languages and everything else performance-wise. For a lot of things it doesn't matter I know, but none of those things are domains I've ever worked in. I've never had a project in my professional career where we didn't descope requirements to fit the available compute.

by galangalalgol

5/17/2026 at 11:33:20 AM

it saves a lot of resources - for instance my devices would probably use less than half of the memory it uses now and I wouldn't hear the fan.

by tcfhgj

5/17/2026 at 1:51:24 PM

You won't hear the fan because you're still building it.

The resources I was talking about are developers × time.

by amelius

5/17/2026 at 2:02:05 PM

I am talking about using software - if software is used by many people, that's the more relevant resource usage.

by tcfhgj

5/17/2026 at 8:58:55 PM

It is a common trend for companies to optimize for visible CapEx at the cost of increased but invisible OpEx for consumers.

by lobocinza

5/17/2026 at 11:25:38 AM

How is it any faster than something written in say, Java?

by gf000

5/17/2026 at 11:32:12 AM

latency and throughput (when with Java the system is crying for more memory while it's chilling in the Rust case)

by tcfhgj

5/17/2026 at 12:07:50 PM

What's the latency difference between a long running process issuing a network call in Java vs rust? This is such a short time that it is completely overshadowed by noise (OS doing something else, what other software is running etc)

As for throughput: you have 1-2 requests going at a time, the next one waiting for the reply. What throughput are we talking about?

That's like speeding to the post office and expecting your letter to get to the recipient faster.

by gf000

5/17/2026 at 12:20:16 PM

you seem to specifically aim at the current example, but mine wasn't

Anyways, consider how higher memory usage can affect the systems performance dramatically once the system needs to start swapping memory to disk signficantly

by tcfhgj

5/17/2026 at 12:56:44 PM

If you cannot write a simple Java agent without consuming so much RAM that your system is swapping then that really says more about the developer than anything.

Java is used in plenty of embedded systems and other memory constrained environments. Yes, it’s not going to perform well compared with Rust, but that doesn’t mean it’s an Electron-equivalent bloated clusterfuck of an ecosystem that’s going to eat all your system resources.

by hnlmorg

5/17/2026 at 1:59:54 PM

> so much

1) the agent is probably not the only thing running on the system, so more is just worse generally

2) I am fine if a developer needs Rust or similar to write a resource efficient app. I wonder what the developer could achieve when he put the optimization effort into the Rust app instead.

by tcfhgj

5/17/2026 at 3:23:30 PM

My point is that Java isn’t going to be the application that sends your machine into swap hell.

People are so narrow minded about programming on this forum. They talk as if only Rust fills the void between unsafe C and node.js behemoths. But the reality is there are a plethora of other good languages out there too.

by hnlmorg

5/17/2026 at 3:29:30 PM

Of course, what would be a point of talking about an overly specific statement that has no relevance here?

by gf000

5/17/2026 at 7:15:53 PM

> That's like speeding to the post office and expecting your letter to get to the recipient faster.

I mean, the post office is not a magic box. Actual people will take your letter somewhere, sometimes batching sends. So running to the post office might actually get your letter in an earlier batch, same as ordering on amazon or your online supermarket in the morning or in the evening might change the delivery time.

Pedantic, I know, but interesting example.

by mejutoco

5/17/2026 at 12:15:17 PM

You can tune java runtime in many ways, achieving impressive throughput/latency for your type of workload.

Next to none of them will get you nearly as good cold start times as of native app, if using free java.

There was GraalVM and its ecosystem which included Java Native Image - first thing I’d evaluate if thought about non-server side, performant Java application.

But it all had been sadly swept away by Oracle from free tier.

by ink-splatters

5/17/2026 at 1:27:27 PM

I use GraalVM and Native Image now and while the project --a small CLI tool-- is tiny (2kLOC with mainly AWS-SDK deps) the compile times are huge (~3 minutes), the OS-dependencies many (so much I use a build container to ease the burden of installing all) and the resulting binary is huge (~60MB).

But then it distributes as one binary and starts in milliseconds.

Rust would have been a better fit (cargo-and-done, smaller binary, quicker to compile); but I wanted to use Kotlin as we use in all other projects.

by flossly

5/17/2026 at 3:30:39 PM

It hasn't been swept away by Oracle, far from it. It's development is just no longer coupled to the OpenJDK release cycle, which benefits both projects.

by gf000

5/17/2026 at 10:09:49 AM

Simplest explanation I could come up with: Just for hype and fun.

Rewriting things in rust is "cool". Bun did it, other projects did it. Therefore, writing a coding agent in one should be cool too.

And apparently enough HN crowd agrees with it to take the #1 spot on the board.

by tornikeo

5/17/2026 at 11:12:42 AM

For the most part, doing things right in the given language matters more than change of language. A lot of refactors in Rust (in the coding agent space) I see jump straight to Rust without considering what inefficiencies can be addressed before changing the language.

Having said that, I considered a Go/Rust rewrite of Dirac (https://github.com/dirac-run/dirac) for some modules to support cases when someone wants to run like 30 agents, but it quickly became obvious that, a) while the node event loop is a bottleneck, it is not the sole bottleneck and b) if you have a VSCode extension, you can't totally get rid of TypeScript, so it just becomes the case of bi-lingual project and the maintenance burden that comes with it

by GodelNumbering

5/17/2026 at 1:32:46 PM

Rust is just another language. Sure it's cooler than some langs, to some ppl. Sure.

The author made the choice. Open sourced it (thanks!). So now we all enjoy more options. Saying author did so because "cool" does not sit well with me. It's feels like you get a no-strings attached gift of significant value and then going saying the giver gave it to be seen as cool.

by flossly

5/17/2026 at 10:37:01 AM

Opencode can be surprisingly hard on the CPU (could be an issue when coding on battery or a weak remote VM), and uses a lot of RAM. A little competition is always welcome.

by joelthelion

5/17/2026 at 10:19:34 AM

Even a simple coding agent TUI should work instantenously, which I sadly cannot say is true about typescript-based applications like Claude Code or Gemini.

After switching away from GNOME Terminal + Zsh to Ghostty + Nushell, I started to appreciate how instant everything feels. Why not make everything just as fast?

by wint3rmute

5/17/2026 at 11:29:01 AM

I have to say this is one of my favorite things about local Qwen and Qwen code, it seems a heck of a lot faster that Claude and feels better to work with.

Problem is it is nowhere near as smart, so what speed I get in conversation gets killed by iteration.

by itsdavesanders

5/17/2026 at 10:48:52 AM

I didn't see anyone mention this, but I think having a single binary is much nicer than having a JS (or Python) program sprawled all over your system.

by jwxz

5/17/2026 at 12:21:09 PM

Having single binary output is completely different problem and is solved for both Python and typescript (bun supports the later).

by ink-splatters

5/17/2026 at 12:53:10 PM

Node and Deno can also bundle apps into a single executable.

by crabmusket

5/17/2026 at 1:15:02 PM

Over time software grows. Once big rewriting it in another language is hard and gets harder as the project grows in size.

Starting with a resource-saving attitude may be a very good long term strategy.

Also: with Rust there are many features of high-level, modern, type-safe, FP-inspired languages that you do not have to miss.

by flossly

5/17/2026 at 4:39:46 PM

Most FP languages cannot work without GC unless you're willing to give up idiomatic FP programming. There is a reason Haskell has a garbage collector.

by amelius

5/17/2026 at 10:36:32 PM

Hence I used FP-inspired (to point at languages like Rust, Kotlin, Ruby, Swift)

by flossly

5/17/2026 at 4:51:01 PM

That's exactly the tradeoff I made with Barnum (https://barnum-circus.github.io/). It's just not important to optimize the performance of the rust side for the reason you stated. So instead, all focus goes into making it easy for an LLM to build a reliable pipeline (from which LLMs are invoked).

by rbalicki

5/17/2026 at 10:36:32 AM

While we are not there yet, people are looking into running agents in esp32 and alike.

See projects such as picoclaw, nullclaw and more.

https://github.com/sipeed/picoclaw

https://github.com/nullclaw/nullclaw

by throwa356262

5/17/2026 at 11:56:41 AM

e.g. opencode right now uses ~80% of my CPU.

At first I also thought that it would be just call and wait, but a lot of work is done locally (any tool calls).

by krzyk

5/17/2026 at 3:28:47 PM

It's also dealing with memory issues (see: Memory Megathread https://github.com/anomalyco/opencode/issues/20695).

And in my experience is not that much faster to start than more complex software like Visual Studio Code.

by tacone

5/17/2026 at 2:01:49 PM

If you write in Go, you get faster compile time, more likely your code will compile fine after long time.

by faangguyindia

5/17/2026 at 10:53:56 AM

- Reduce the footprint on the planet

- prolonged life of hardware

- less electricity

- less expensive hardware

by tcfhgj

5/17/2026 at 11:02:19 AM

Compared to what LLMs actually consume, your agent makes zero difference

by sdevonoes

5/17/2026 at 11:58:46 AM

Why would anyone compare a cloud LLMs power usage when one doesn't pay for it? Local power consumption is important for those.

by krzyk

5/17/2026 at 12:30:32 PM

OP specifically cited “reduce the footprint on the planet”

by afavour

5/17/2026 at 11:38:32 AM

very wrong - especially on the local machine, see https://news.ycombinator.com/item?id=48164613

by tcfhgj

5/17/2026 at 11:26:32 AM

Running many of those in scale.

by iddan

5/17/2026 at 11:06:06 AM

I recall back in the mid 2000s when i saw many "rewrite in rails" apps. Its just hype, and it will die out in a few years when something new comes out.

by phplovesong

5/17/2026 at 11:18:01 AM

[dead]

by cpa

5/17/2026 at 12:03:15 AM

Thanks, I've been tooling away in my spare time on my own version of this -- both to get a deeper understanding of agents (everyone suggests writing your own) and to help learn Rust. I'd like to retain `pi`'s configurability though, the ability to self-mutate and generate new tools is incredibly useful, particularly because I don't think any of these things should have access to arbitrary code execution through `bash` (of course, if they have access to, say, `edit` and `cargo run` they still have arbitrary code exec, but...) (so I tend to generate tools on the fly when I encounter something the no-bash agent needs to do).

by frio

5/17/2026 at 12:10:02 AM

I actually though about this issue, but while Pi can have this script-like environment thanks to the fact that it's based on an interpreted language (TypeScript), Rust has its own limitation as a compiled language.

I decided to allow for customization in a different way:

1. The prompt library (~/.config/hypernova/prompts/) acts as a simpler alternative to Skills, with the built-in prompts that should replace superpowers + Claude's frontend-design

2. Compile-time features; things that might make the agent more bloated can be disabled when you decide to compile zerostack

3. Clean code; code that's short and easy to read, you can just throw zerostack on its own source code in order to build a custom fork if your necessity can't be satisfied. Good features could also be adopted by the main version.

4. Permission mode; as you can see in the README, there was lots of concern around the permission model, and I landed on a 4-mode system that goes from "Restrictive" (no commands) to "YOLO" (whatever the agent wants to do" + custom regex patterns for allow/ask/deny permission on 'bash' calls. In your case, you just need to run `zerostack -R` to force all tools to ask for permission.

(Also, there is a work-in-progress features for programmable agents, but that's yet to be announced)

by gidellav

5/17/2026 at 6:15:42 AM

Ok, what about having tools be discoverable from the environment, similar to how $PATH works in POSIX?

There could be an env var $AGENT_TOOLS, a string of paths delimited by `:` and tools would be discovered as some specific format of file. Maybe a JSON that contains tool name, list of parameters and the command to run it.

This is essentially decoupling tools from the agent, allowing more customization and per-project environments. It does require shipping and installing more binaries, one for each tool probably.

by aerzen

5/17/2026 at 4:11:07 PM

The Hermes agent (Python) follows something similar; it defines a HOME dir and enumerates plugins and memory extensions present there.

https://github.com/nousresearch/hermes-agent

Functionally, it fits more in the openclaw space than pi-agent.

by threecheese

5/17/2026 at 12:08:22 PM

This is one of the approaches im considering for my own, Roder.

The approach mostly being communicating over json rpc which has become the standard for MCP so it makes it more approachable to agent developers.

Obviously its very much NOT mcp, its a low level events based rpc system for registering capabilities and extending low level primitives of the agnet itself not the model

by zrg

5/17/2026 at 6:32:00 AM

I understand the concept, but I don't get what's the advantage over adding in the prompt instructions to use a specific bash command for a specific task, acting as a "custom tool".

by gidellav

5/17/2026 at 9:48:33 PM

The harness clamps what the agent can do. `bash` allows full code execution; a dedicated `mvn` tool might only allow `mvn compile` but not `mvn spring-boot:run`. You could probably implement this with an `allow` list attached to your `bash` tool, but by doing it this way, you can enhance the outputs or perform mandatory checks too.

For instance, Claude likes to run little Python scripts; reviewing them is tedious. Removing `bash` and adding a `python` tool would allow the harness to pre-review and grep for common harmful patterns, or run the `python` script in a `krunvm` or `muvm` to isolate it, etc. This review/isolation would be handled programatically as it's part of the harness; leaving the agent to choose what to do as a skill means the agent can conveniently forget to enforce its own checks.

by frio

5/17/2026 at 11:37:36 AM

Good point. There might be a small advantage if one does not want to give bash access. But general answer to "how do add custom tools like we can in pi" is "you don't". Keep it simple.

by aerzen

5/17/2026 at 12:13:54 AM

I've been trying to use `Deno` underneath `Rust` so that the tools can still be written in Typescript and thus self-mutated without the compilation step (but I can still try to do clever things with V8 Isolates or similar). It's been an ugly experiment so far; I'm vaguely thinking a simpler model would be to just define a binary "API" and run tools by exec-ing binaries.

by frio

5/17/2026 at 12:18:33 AM

I have to be honest and tell you that try to load such an heavy runtime as a scripting layer is not a great idea; at the same time I can tell you that I am working on another Rust project where I also needed scripting, and after three attempts I landed on rhai (https://rhai.rs/) (https://rhai.rs/book).

You might find it nice for pretty much all use cases except for high-performance scripting (so, if you are not try to build the entire logic entirely in rhai, you are going to be fine).

by gidellav

5/17/2026 at 12:21:30 AM

Yeah, it's been a bit of a dead end. I didn't want the heavy runtime but felt it was worth disproving after experimenting rather than ruling out off the bat. Even before getting it running, the dependency list alone was pretty discouraging, especially given the storm of supply chain attacks these days.

Rhai looks nice, I'll take a look, thanks! And good luck with Zerostack.

by frio

5/17/2026 at 1:38:50 AM

[dead]

by aschar

5/17/2026 at 2:10:23 AM

I was just going to suggest rhai. It's simple enough LLMs can easily write it with a little context, and you control the entire API so you can sandbox effectively without needing to resort to hacks with a JS interpreter etc.

by slopinthebag

5/17/2026 at 6:15:32 AM

I agree v8 and Deno seems very heavy handed and complex to integrate for scripting capabilities.

Have you considered Lua? It is tailor made for use cases like this. Creating an embedded host in Rust is trivial, the work lies in creating built-in functions for the script runtime so that the user scripts can do useful things to the environment.

by slowhorse

5/17/2026 at 2:18:56 AM

Have you thought about Zig? If you limit it to CompTime, isn't that just a scripting language that happens to be compiled to binary?

by BillStrong

5/17/2026 at 11:14:05 AM

That’s not how it works. Comptime Zig is Zig, not an embedded scripting language. You can’t run comptime code separately, it only runs as part of compiling a Zig program. Think of it like Rust macros.

by brabel

5/17/2026 at 5:27:02 AM

Possibly, I'm not really interested in learning Zig though (or learning to embed it in Rust). I'm sure that'd be a cool project for someone else to try :).

by frio

5/17/2026 at 12:51:02 AM

Why not WASM?

by jswny

5/17/2026 at 2:06:35 AM

Unfamiliarity and I believe it requires a compile step. I’m at least familiar with Typescript and Deno so being able to embed them was an appealing idea :)

by frio

5/17/2026 at 5:43:21 AM

> simpler alternative to Skills

this concerns me. Skills are already just about the simplest possible thing; they're just prompts, in a directory!

by kristjansson

5/17/2026 at 6:21:49 AM

Skills are notably more complex than that. They require metadata (which the model is given and uses to determine whether or not to load the main file), are intended to be loaded via a tool call, contain extra resources (also loaded by tool calls), etc. In contrast, with this system the harness doesn't need a tool to load the stored prompts, the prompts don't need to include metadata to allow for runtime discovery, etc.

by lunar_mycroft

5/17/2026 at 12:22:48 PM

Runtime discovery is the entire point of skills. Without it, this is just a templating prompt system that the user has to remember to use… except because this one changes your system prompt, it also busts your cache and costs you extra money when you use a prompt.

Skills are already dead-simple and this prompt system doesn’t at all tackle the same problem.

by cobolcomesback

5/17/2026 at 1:51:50 PM

"{Feature} is the whole point of {more complex technology}" is an objection that can very often be raised. That doesn't mean that giving up features in exchange for simplicity is always the wrong call. And there's also advantages to having the user drive what instructions go into the prompt instead of the harness/model.

by lunar_mycroft

5/17/2026 at 2:13:11 PM

This is tangential to the point. It’s often great to have a simpler version of a solution, even if it eschews some features. But this isn’t that. OP claims that the prompt system is an “alternative” to skills, but it isn’t. It isn’t solving the same problem that skills solve at all. It’s like saying that a bicycle is a simpler alternative to a lawnmower because they both have wheels.

Prompts are a feature that are simpler than skills, sure, but they’re a completely different feature entirely.

by cobolcomesback

5/17/2026 at 2:30:21 PM

It's an alternative in the same way e.g. plain markdown is an alternative to HTML, even though plain markdown lacks some of the features of HTML. "X is an alternative to Y" in this sense doesn't mean "X all the same features of Y", it means "you might reasonably choose to use X instead of Y, depending on your exact usecase"

by lunar_mycroft

5/17/2026 at 6:33:59 AM

Exactly, this was my thought process when deciding if we should have Skills or not.

In the end, I think that this prompt-only design, with the integrated tools that come with zerostack, is more than enough.

by gidellav

5/17/2026 at 5:52:46 AM

So are these lol

by backscratches

5/17/2026 at 3:07:11 AM

I’ve been doing the same thing in zig haha.

by praveer13

5/16/2026 at 11:13:48 PM

"RAM footprint: ~8MB on an empty session, ~12MB when working"

I like this, Claude Code is using multiple gigabytes, which is really annoying on lowend laptops

by throwa356262

5/17/2026 at 1:34:58 AM

I'm building an agent framework in golang and it is extremely light weight. Startup time is under 1/2 second, and RAM usage is really low. I have a 12 year old laptop and it happily runs without slowing down.

There's no reason what is essentially a string concat engine should be slow on any hardware, including old hardware.

by all2

5/17/2026 at 6:36:45 AM

Isn't 2 second startup time a lot? With zerostack, I managed to get it down to ~90ms

by gidellav

5/17/2026 at 7:00:47 AM

They said 1/2 as in 0.5 seconds as in 500 ms.

by NewJazz

5/17/2026 at 9:56:16 AM

Sounds interesting, would you like to share any more information about your project?

by throwa356262

5/17/2026 at 8:32:22 PM

Link is here [0]. The idea is to model cognitive states (how to think), and workflows (what to think about) as statecharts. The charts will be defined in YAML (version-able, hot-reloading). Context payloads are defined in an agent YAML file. Think of it as a map, like a drive map for a computer's HDD/SSD. You spec the order of context chunks, what goes into them, and then when the inference payload is built, it uses the context map definition (comprised of the chunks you defined), the agent definition (including model params like context length, temp, etc), cognitive state, and workflow state to build out the inference payload.

Agent cognitive states may add chunks to the system prompt. Workflows may add chunks to the system prompt. Tool access may vary by agent/workflow state (policy is last-defined-wins overlays to keep it simple to reason about).

Agents may run by themselves or be 'bound' to a workflow. Agents can detach from a workflow before it is finished, and either re-bind, or another agent may bind to the workflow (one implements, another reviews, for example).

Conceptually, this is all very simple, which is why I'm hand rolling it.

The goal is a minimal runtime that can support long-running agents in a 'zero human company' setting.

On top of the runtime will be a minimal change control workflow (if you've spent time in hardware engineering, these are standard processes governed by a company's quality system).

I've yet to wire in the economic pieces (token spend, power consumption, rollups that show performance of various agents based on inputs and outputs).

It is a bit far fetched, but I'd like to get this thing ISO9001 certified, and maybe AS9100 certified.

This is all to scratch my own itch, tbh. Most agentic systems are hard to reason about, bloated, lack visibility in the appropriate places, lack economic data of sufficient granularity, and so on. So I'm building this.

[0] https://github.com/zerohumancompany2/maelstrom-code

by all2

5/17/2026 at 2:52:54 AM

I've been trying to migrate over the zed and think they're Agent Client Protocol[1] is pretty neat, I wonder how much memory pressure Claude Code exerts if it is going through that mechanism instead

1: https://zed.dev/acp

by rel

5/17/2026 at 4:16:12 PM

Not answering your question, but I just realized the new Anthropic billing changes are affecting ACP clients like Zed :(

https://zed.dev/blog/anthropic-subscription-changes

by threecheese

5/17/2026 at 12:05:43 AM

The memory footprint is great, it allows finally running these coding agents in extra small instances -- say x1 on shellbox.dev

by messh

5/17/2026 at 3:10:50 AM

Hmm, if they're this small something like smolmachines (like shellbox, but free and local) might be a great fit.

by chrisweekly

5/16/2026 at 11:19:22 PM

Yes. Just this fact is going to make a lot of people try it out.

by tecoholic

5/17/2026 at 10:04:50 AM

I have 29 Claude Codes open, using 6.3 GiB RSS total

by rane

5/17/2026 at 3:30:23 AM

Are you sure you don't have an LSP plugin or something running?

by esperent

5/16/2026 at 11:22:18 PM

Isn't that because of the context window size?

by marknutter

5/16/2026 at 11:36:22 PM

Hi, I'm the developer of zerostack! No, the memory footprint is not beacuse of the context window size: on my benchmarks, with a 128k context loaded, and it jumped from 8MB (without any chat/context loaded) to 11MB.

The reasons why the memory footprint of zerostack are:

- Rust, and not JS/Python, so no interpreters/VMs on top

- Load-as-needed, so we only allocate things like LLM connectors when needed

- `smallvec` used for most of the array usage of the tool (up to N items are stored in stack)

- `compactstring` used for most of the string usage of the tool (up to N chars are stored in stack)

- `opt-level=z` to force LLVM to optimize for binary size and not for performance (even tho we still beat both in TTFT and in tool use time opencode)

- heavy usage of [LTO](https://en.wikipedia.org/wiki/Interprocedural_optimization#W...)

by gidellav

5/16/2026 at 11:28:05 PM

The context window has nothing to do with RAM usage and even if it did, a million tokens of context is maybe 5mb.

by SatvikBeri

5/17/2026 at 3:37:55 AM

'A million tokens of context' is literally Terrabytes of KV cache VRAM on very expensive Nvidia silicon - on the model.

On the Agent, yes, the context window does relate to RAM, because the 'entire conversational history' is generally kept in memory. So ballpark 1M 'words' across a bunch of strings. It's not that-that much.

Claude Code is not inneficient because 'it's not Rust' - it's just probably not very efficiently designed.

Rust does not bestow magical properties that make memory more efficient really.

A bit more, but it's not going to change this situation.

'Dong it in Rust' might yield amazing returns just because the very nature of the activity is 'optimization'.

by bluegatty

5/17/2026 at 4:41:10 AM

Rust "denialism" is as annoying as rust evangelism.

Of course any seemingly idiomatic rust is going to run circles around TS transpiled into JIT-compiled JS.

by rixed

5/17/2026 at 7:58:03 AM

Lamenting any 'not even criticism' of Rust as 'denialism' is just evidence of the insane cult that is Rust.

Rebuilding Claude Code in Rust will make almost no difference in terms of real world performance. V8 is 'relatively fast', and there wouldn't be any noticeable improvements there, and probably not memory footprint either.

The source for Claude Code was leaked and it's a vibe-coded mess, there's not much thought given to clean architecture, it's unlikely they've just cleaned up a bit and given thought to memory consumption etc, if they did, they'd get by far most of the way there and likely abnegate and real want to 'do it in rust', unless there are other architectural considerations.

by bluegatty

5/17/2026 at 6:34:34 PM

You're the delusional one for bringing up the memory usage of the inference server that clearly isn't running inside the coding agent.

The problem with your comments is that you're showing off a fundamental lack of understanding between managed languages and unmanaged languages.

The vast majority of GCs are optimized for throughput and allocate big chunks of memory. They also tend to never release it if there was a temporary memory spike. The most advanced GCs also tend to have either read or write barriers, which slow down basic object accesses.

Just in time compilation and managed languages in general need to retain a runtime representation of the source code to perform JIT compilation and then they have to store the compiled code in memory as well.

JavaScript uses references against dynamic objects, which means you have to pay the indirection cost of a pointer but you also need to store type information as well to monomorphize the object literals and classes at runtime and fall back to a regular hashmap when fields are added dynamically.

All of these things will add up and increase the amount of memory the application uses and how slow it runs.

Sure Claude Code has severe architectural issues causing it to leak hundreds of gigabytes of RAM, but if those were not there you could easily build a C++ based alternative that runs circles around a hypothetical JavaScript based Claude Code that got its act together.

by imtringued

5/18/2026 at 3:23:34 AM

1) I'm not 'delusional' for bringing up 'What Memory is Used Where' - I'm clarifying for the people who seem a bit confused (see above) as to 'where the context lives' - and trying to provide a simple mental model for that.

That's the opposite of delusional.

It's just information.

Attacking people for anything 'Rust related' however - is the quintessential reason why everyone hates the Rust community.

2) 'The problem with your comment' is that it's presumptive and arrogant - as if I 'don't know the difference between GC and managed languages'.

I've been writing software since 1990.

Embedded (on custom Silicon), UI, SaaS, backend, some embedded work I've done is still in production today from almost 30 years ago.

I've written a scripting languages (for production), and cyclic ref-count gc (didn't make it to production).

Your comments about GC etc. are fine - but they but they don't really offer any insight into the actual problem.

There's one critical detail aka 'memory not released after spikes', yes, this is observed behaviour, but it's usually accommodated with a little bit of decent Engineering.

If you're going to make the comparative basis an an 'Idiomatic Rust' solution (aka good patterns), the we should make the assumption of an 'Idiomatic Node' solution for Claude Code.

3) 'The other problem with your comment' is that your conclusion is wrong - by your own hand.

Right here: "Claude Code has severe architectural issues causing it to leak hundreds of gigabytes of RAM," - the implication being that Claude Claude does not inherently have to 'leak all that RAM' - and would run just as fine with some basic work.

An 'Idiomatic Node' implementation of Claude Code wouldn't exhibit those problems, and would perform pragmatically just as well as an Idiomatic Rust implementation.

From a memory management situation, Rust might use significantly less memory, but a 150Mb footprint vs 350Mb foot print for an average session is 'pragmatically immaterial'.

The difference in 'perceived performance' would be negligible - if any.

The 'cost' of writing a the 'kind of program that Claude code is' in a systems-level language would be quite a lot, for not really much benefit.

The 'Rust or C++' solution would not 'run circles' around the 'node' implementation in anything but some 'preformative', inward looking benchmarks, aka 'the worst kind of Engineering'.

Consider pondering why almost nobody writes such applications in Rust or C++.

by bluegatty

5/17/2026 at 11:48:07 AM

You have a point but it's definitely not TBs for 1M. Should be more like 100G.

by regexorcist

5/17/2026 at 3:02:41 AM

It has nothing to do with local RAM usage. But a million tokens of LLM context is decidedly not 5mb.

The rough estimate is 2 * L * H_kv * D * bytes per element

Where:

* L = number of layers * H_kv = # of KV heads * D = head dimension * factor of 2 = keys + values

The dominant factor here is typically 2 * H_kv * D since it’s usually at least 2048 bytes. Per token.

For Llama3 7B youre looking at 128gib if you’re context is really 1M (not that that particular model supports a context so big). DeepSeek4 uses something called sparse attention so the above calculus is improved - 1M of context would use 5-10GiB.

But regardless of the details, you’re off by several orders of magnitude.

by vlovich123

5/17/2026 at 3:19:09 AM

Pretty sure we're talking about the output text, not the tensors.

by tujux

5/17/2026 at 7:38:11 AM

These LLM replies are really getting annoying.

by m00x

5/17/2026 at 2:48:17 PM

Mine? I literally wrote what I wrote because “context window” as a term of art refers to the LLM’s context window.

I guess get better at detecting LLMs instead of accusing everything of being an LLM reply?

by vlovich123

5/16/2026 at 11:40:39 PM

The context window is not on your system. It's on the server with the model. There may be some local prompt caching, of some sort, but you're not locally hosting the context unless you're also locally hosting the model.

by SwellJoe

5/17/2026 at 3:39:01 AM

Chat history is kept locally, generally you have to send the 'whole history' to the model 'each turn'.

by bluegatty

5/17/2026 at 6:17:50 AM

That's just the plain text (or whatever files), that's not the context the model is directly working with on the server, which is tokenized, embedded, vectorized and has attention run against those vectors. The local history is generally quite small, the context generally quite a bit larger. A text conversation of a few hundred kilobytes in plain text will be gigabytes in context.

by SwellJoe

5/17/2026 at 7:52:29 AM

KV for a sota model is into terrabytes

by bluegatty

5/17/2026 at 4:46:11 AM

Only "generally"? I'm curious what API has moved away from this protocol that seems mode adapted to conversaions with humans than agentic loops.

by rixed

5/17/2026 at 8:25:18 AM

To me it would certainly make sense if the protocol just said "append this text to context window id/sha256", in particular as the data is cached in tensor level in the provider side, so they need to first do that lookup anyway. So I would be surprised if they don't have that.

In addition, this protocol could make it more transparent to say "oh we cannot proceed as we dropped the this cache, are you sure you want to proceed and consume a whole lot of expensive uncached tokens?". Oh, maybe that's a reason not to do it..

by _flux

5/17/2026 at 5:03:25 AM

So the standard API you pass it all along but I think there are some odd open ai apis that are different.

by bluegatty

5/17/2026 at 6:08:40 AM

I had Claude Code build me one of these as well, though I added Dirac's line hashing for edits etc. Also used Rust, and I had this idea that I should use plugins so it can self-edit by implementing in hooks but in the end, I just have it create exhaust information about improvements into a separate file and just update the source code and recompile. The source code is in a fixed place so it can just rewrite and build the agent itself. I use it with DeepSeek 4 Flash running on 2x RTX 6000 Pros which I get some 138 tok/s on.

To be honest, I just plagiarized Pi, Dirac, OpenCode. Any new tricks in this one that I can steal?

by arjie

5/17/2026 at 8:28:26 AM

Take a look at OpenAI blogs about codex: https://openai.com/index/unrolling-the-codex-agent-loop/ https://openai.com/index/harness-engineering/ https://openai.com/index/unlocking-the-codex-harness/

by joshka

5/17/2026 at 11:05:16 AM

Creator of Dirac here. Glad to see it mentioned and even more glad that you found it useful.

I am currently in deep refactor mode to introduce modular tooling to Dirac since the concept of 'fixed' set of tools is starting to feel antiquated, adding tools on demand would be super convenient and a likely replacement for MCP (I understand not all use-cases of it)

by GodelNumbering

5/17/2026 at 11:50:09 AM

Curious how you’re handling prompt caching, as I understand it most LLM providers essentially inject tool definitions in the system prompt, so changing tools dynamically breaks the cache. This has been a big annoyance for me in a separate project; I currently just implemented my own tool-ish system that defines schemas in user messages and instructs the LLM to return matching JSON, but it’s less reliable than using the native tool calling + structured outputs available in the API.

by karagenit

5/17/2026 at 10:19:44 PM

Native tool calling indeed. By modular, I meant the tool defs are loaded dynamically per task and stay the same during the task

by GodelNumbering

5/17/2026 at 6:29:57 AM

Some interesting features I add on top of being lightweight are the prompts library, Git worktrees integration and Ralph Wiggum loops integrations.

by gidellav

5/17/2026 at 7:39:28 AM

Very cool. Thank you! I will look.

by arjie

5/17/2026 at 6:51:05 AM

Is it public on github?

by teo-mateo

5/17/2026 at 5:13:03 PM

Mine? No. It’s super idiosyncratic and I haven’t validated that it has not leaked secrets into the codebase.

by arjie

5/17/2026 at 7:02:59 AM

Yes.

by normie3000

5/17/2026 at 4:38:06 AM

This is nice! I tried it for a bit and it was indeed quite fast. Are you looking for contributors, or are you building this as a personal tool? I ran into some issues when attempting to use different models, though: gpt-5.5 on Azure doesn't work, even with the OpenAI compatible endpoint, because "max_tokens" has been replaced with "max_completion_tokens". And it doesn't appear possible to pass through custom headers, so I wasn't able to specify reasoning_effort for deepseek models.

by wkcheng

5/17/2026 at 6:37:47 AM

Yes, I am open for PRs.

What you showed is a clear bug in my codebase, if you can, open a Github issue with each of your bugs.

Thanks!

by gidellav

5/17/2026 at 7:39:20 AM

We don’t trust llm execution- so we add user approvals. But task decomposition calls for co-recursion between code and prompts. This means that the approvals should be evocable at any depth. I think we need some kind of protocol for that (à la the Cubes OS protocols for cut and paste between vms).

Maybe a workaround could be to use bubblewrap of the scripts ther recursively call the llm (and run the agent in yolo inside the wrap).

by zbyforgotp

5/17/2026 at 7:44:06 AM

Well, or not spawn any external commands, and actually have tools made of code written by someone who thought about what the agents at each level should be limited to doing.

by frabcus

5/17/2026 at 7:47:07 AM

In the limit we want the llm to write the code (like in RLMs).

by zbyforgotp

5/17/2026 at 7:58:48 AM

Or just run agents in a container…

by alfiedotwtf

5/17/2026 at 8:15:42 AM

Currently, having LLM feeding on its own output repeatedly is the fastest way to get it hallucinate.

by hashmal

5/17/2026 at 2:21:51 PM

Too late for fixing it - but of course I meant https://www.qubes-os.org/

by zbyforgotp

5/17/2026 at 8:57:05 AM

Transactional recursive agents ?

Nothing is committed until the final top-level transaction is accepted.

by agumonkey

5/17/2026 at 12:35:25 PM

zerostack contains --sandbox flags that forces bwrap usage on all shell tool usage

by gidellav

5/16/2026 at 11:45:02 PM

Funny this comes out today. I was just about to start to write one in rust. It's amazing having opencode slowly leak memory and end up becoming 6gbs on a large project and then get slower and slower.

Will check this out! Seems cool!

by 360MustangScope

5/17/2026 at 12:11:28 AM

Yes! This project derived from an OOM killer activation that happened on my old laptop beacuse i had more than 2 opencode instances open together with Firefox...

by gidellav

5/16/2026 at 11:31:25 PM

The codebase was small enough that I handed it over to DeepSeek v4 Flash in Pi to skim through for any risky business, and I didn't find anything concerning. Nice work.

by hiAndrewQuinn

5/17/2026 at 12:15:44 AM

Since the OP stated they used DeepSeek V4 Flash for generating a lot of the code, I decided to check whether there were any outdated dependencies. In my experience, with Rust projects, if you do not instruct models (even Claude 4.7 Opus) to use `cargo add` instead of manually editing the Cargo.toml, you will almost certainly get out-of-date dependencies added to your project.

Manually checking the dependencies used by this project, I was pleased to see they are all the latest version. That doesn't mean there are no issues lurking in transitive dependencies, of course.

As for getting an LLM to review the code, I think we can get all opinionated very fast. For instance, when I was eyeballing the code, some of the enum methods converting to/from strings made me think "this could've been a single #[derive] with strum." That would make the code in provider.rs a lot more concise, at the cost of importing one crate (with no dependencies!)

Lastly, for fun, I decided to get DeepSeek V4 Pro (with Max thinking) to "audit" the codebase. The output mentioned no obvious signs of hidden telemetry, but it did note that the project sets the panic handler to "abort", which I have strong opinions on... Presumably the OP wanted to avoid linking against libunwind to save a few kilobytes of binary size, but now you have a binary that immediately aborts and doesn't give the user a stacktrace of what just crashed. I would rather have a ~50 KiB larger binary if it means getting useful debug info during a panic. Additionally, if there are async tasks that panic, they can't be recovered to display a generic error message; instead the whole process just aborts.

by koito17

5/17/2026 at 12:41:26 AM

Hi, nice comment!

1. I had experience not only with wrong versions selected by the agents, but also weird crates (ex. choosing a crate with 10 github stars when a more complete and more supported one was available), reason why now I always choose the dependencies and then I let the agent work.

2. Yes, some of the provider code could be made using macros, I am just lazy... But thanks for the tip! I will save it for later.

3. No telemetry, and it can be checked thanks to the fact that there are no HTTP calls outside of the MCP implementation (via rmcp) and LLM connectors (via rig)

4. Yes, i set panic handler to 'abort', thinking that I would've get a nice size decrease: i yet have to experience a panic on this project, but I will revert it to default behavior if the binary size saving is really so small

5. While it is async, the entire project runs on one thread (as expressed in the main.rs with ```#[tokio::main(flavor = "current_thread")]```), as it allows for a nice ~8MB memory saving (so, 50% off) and no real performance loss, being such a simple tool.

---

P.S. Just switched back to default settings for panic handler

by gidellav

5/17/2026 at 12:40:39 AM

Hidden telemetry was my big concern, yes; the abort thing wasn't caught as a security thing by DeepSeek V4 Flash but it was mentioned by Claude 4.7 Opus (I wanted to compare and contrast here), and Flash brought it up later when I asked it about performance tuning.

`cargo add` tip is very helpful, I had a hunch this happened in my own Rust project and I think you just filled in the missing piece for me there.

by hiAndrewQuinn

5/17/2026 at 3:05:39 AM

To me panic=abort is much safer security as it means you’re unlikely to enter weird states due to incorrectly handled unwinding. The only attack vector is a DOS attack which is a short term thing that’s easily rectified.

by vlovich123

5/16/2026 at 11:39:31 PM

Thanks! Funny enough, a good chunk of the coding was done by Deepseek v4 Flash, while I hand-wrote a couple of the TUI logic, as deepseek kept failing on certain cursor-moving logic, and I fully managed the memory optimization process (as you can read on another comment I left, it both a set of compiler optimizations and usage of certain Rust crates in order to leverage more efficient data structures).

by gidellav

5/16/2026 at 11:48:51 PM

Taking notes and comparing this against my own (non coding agent) Rust TUI project, thank you! I'm new to Rust so this is a helpful baseline.

by hiAndrewQuinn

5/17/2026 at 12:00:57 AM

No problem, happy to help!

by gidellav

5/16/2026 at 11:44:17 PM

> I handed it over to DeepSeek v4 Flash in Pi to skim through for any risky business

Doesn't prompt injection make that a rather flimsy investigation?

by kadoban

5/17/2026 at 6:48:45 PM

The way I see this going is there will be 10s of thousands of model harness projects out there, because the tools make it so easy to make a harness that suites your workflows exactly the way you like (as someone who made their own harness)

I also used bwrap for sandboxing. I'm looking at layering slirp4netns, because I found out that models will happily break out of the sandbox via the the host network interface.

by wolttam

5/16/2026 at 11:55:44 PM

i built something with a similar philosophy here: https://github.com/khimaros/airun -- it is intended to be piped and redirected. it discovers skills, AGENTS and prompt templates from Claude Code, Pi.dev, OpenCode and others. no TUI, but does have a basic tool calling loop

$ airun -q -p 'output a shell command for linux to display the current time. output only the command with no other code fencing or prose' | airun -q -s 'review the provided shell command, determine if it is safe, run it only if it is safe, and then summarize the output from the command' --permissions-allow='bash:date *'

by khimaros

5/17/2026 at 12:03:17 AM

While I think that the core philosohpy is the same, i'd like to ask: why adding features like Skills and prompt templates?

I personally decided to not implement Skills and instead using a prompt library approach, where certain .md are used to fully replace the system prompt, in order to allow for an approach similar to Skills with ~100 LoC dedicated to this system.

by gidellav

5/17/2026 at 3:48:21 AM

Isn't the key thing with skills that the description is used to match them from a prompt that doesn't mention them?

Would a prompt library do that too?

by afzalive

5/17/2026 at 4:02:50 PM

i wanted airun to be drop-in useful in existing Claude/OpenCode/etc projects and skills are common.

by khimaros

5/17/2026 at 2:38:12 AM

Aren't skills fairly easy to share, and can contain more than one file?

by c-hendricks

5/17/2026 at 3:51:05 AM

Prompts as well... he might be on to something here, can't say as I didn't try it yet

Skills are just prompts

by desireco42

5/17/2026 at 1:45:41 PM

Skills are _like_ prompts, yes, they're extra info added to the context. A prompt is just a prompt though, an agent like Claude could use multiple skills in one go, which seems impossible to do with Zerostack.

by c-hendricks

5/17/2026 at 3:56:47 AM

Most of mine have code in them. That's most of the value.

by hedgehog

5/17/2026 at 12:20:16 PM

Skills are not just prompts.. the entire problem that skills solve is runtime discoverability via a skill description. Agents can self-recognize that a skill would be useful in a situation, and then load+use.

Prompts are just text templates entered by the user, and the user must specifically know when to and remember to invoke them. If you’re just using skills as if they are the same as prompts, you’re totally missing out on the entire benefit that skills provide!

by cobolcomesback

5/17/2026 at 11:37:55 AM

It says inspired by Pi, but I don't see any extension/plugin possibilities. The best feature of Pi is that an extension can hook anywhere and completely change the behavior. It also allows two extensions to stack on the same hook where there are no conflicts.

I believe Pi extensibility is the most important feature, exactly as how it was important for WordPress. WordPress won because anyone could install it and add the plugins they needed. WordPress also has the same hook system where multiple plugins can build on the same hook.

Companies will want to completely customize their agent harness so it optimally works for their situation.

by whazor

5/17/2026 at 11:51:51 AM

I'm actually very close to being ready to release exactly that also in rust. I completely agree with your statement, extensibility is the most importnat feature.

https://x.com/PandelisZ/status/2055633346831548902

The two things I want to get right before actually releasing it is properly eval it againt other harnesses and make sure its better.

And the licence. I don't think a GPL licence will yield addoption so I would like to MIT Roder or figure out the right licence

by zrg

5/17/2026 at 12:32:59 PM

Check https://news.ycombinator.com/item?id=48164948

by gidellav

5/17/2026 at 11:59:35 AM

The most important feature of Pi is that it is small, and has small system prompt, making it great for locall LLMs.

by krzyk

5/17/2026 at 5:55:36 PM

Yo that's really similar to my very own https://github.com/tontinton/maki only I'm MIT and you're GPL, cool

by tontinton

5/17/2026 at 6:32:17 AM

Really neat, I’ll have to try it when I’m at home. Lean, fast tools really make a difference in the coding experience.

I’m curious how the prompts idea performs in practice compared to typical skills and subagents. I frequently combine the two to get otherwise tricky workflows done. Say I have a failing build. I invoke my /fix-ci skill (sometimes in the same context I made the code change in), it launches a subagent to extract an error message / stack traces / relevant logs, and works through the problem. Say an integration test ran into a db query issue. Sometimes the agent itself, sometimes with a slight nudge from me, will load the readonly db access skill and start investigating. If I expect long, deep shenanigans, I’ll often say something like „use a sonnet subagent and instruct it to use the db query skill to debug the behavior we’re seeing”. And it can keep going like that: skills give extra capabilities on the fly, subagents isolate context to prevent bloat. Intuitively, it seems that by the agent running itself via bash with different prompts _might_ come close but a bit less streamlined? I’d have to check and see.

by goyozi

5/17/2026 at 6:41:44 AM

Well... for the most part, you use it like skills, but instead of "commands" you can think of "environments": so '/prompt debug', which is one of the integrated prompts, allows for a debug-focused agent, you can then talk to it as a normal agent, and then '/prompt code' to go back to the standard coding agent.

About subagents: as of right now, the entire agent runs on one context buffer, so it doesn't support subagents in order to keep it lean; but there is a great chance that subagents will be added, as explore-heavy tasks often bloat the context window

by gidellav

5/17/2026 at 6:46:41 AM

It sounds like you're saying that /prompt changes the system message part of the session. Doesn't that cause a cache break and result in higher usage/cost?

by post_below

5/17/2026 at 9:07:51 AM

I took a quick look at the source code and it looks like, yes, using /prompt during a session will rebuild the session with a new preamble/system prompt, causing a full cache miss on the next turn.

So in that way it's not like skills at all, neither of those result in paying full read price on the entire session, just the skill prompt itself.

Something else I noticed... In the Anthropic implementation it doesn't seem to be using 'cache_control' in the body. Assuming my understanding is current, without that the Anthropic API won't do any caching at all (unlike most other APIs that do some level of automatic caching without it being requested). So that would result in paying full read price on every turn.

Of course I could be missing something, it was a quick look. Can you clarify?

by post_below

5/17/2026 at 6:48:54 PM

https://forgecode.dev/ https://github.com/tailcallhq/forgecode is written in Rust too and seems surprisingly capable. How does Zerostack compare to forgecode?

by halcyonblue

5/17/2026 at 11:35:43 AM

I tried to list the competing open-source AI coding agents to compare their popularity over time — opencode wins for now.

https://www.star-history.com/?repos=anthropics%2Fclaude-code...

by GTonehour

5/17/2026 at 12:28:11 PM

> Bash execution ... optional sandboxing for isolation

Sandboxing should be the default. Rather than routinely allowing unsandboxed access, one should be able to configure the sandbox to allow exactly what is needed

That's hard. For example, I've been unable to give wayland access to agents inside the sandbox (there's a special flag in bubblewrap to mount /dev/dri in a way you can make use of it, but you also must give access to the wayland socket, and maybe other things). So I think that maybe harnesses should invest in more sandboxing resources

by nextaccountic

5/17/2026 at 12:31:25 PM

This is actually a topic of current interest, and I think that I will switch to a sandbox-by-default once the bwrap implementation inside of zerostack is well tested and highly configurable.

by gidellav

5/17/2026 at 8:00:35 AM

Love it! I think the minimal approach you took is the right path forward. As others mentioned, small harnesses make it possible to run many agents in parallel and in small cloud instances. working on a minimal agent in Go myself for this use case.

by sinansaka

5/17/2026 at 9:49:50 AM

I wonder how this compares to tau https://tau-agent.dev/ ?

Both are in Rust and both mention Unix in their descriptions.

by martingxx

5/17/2026 at 9:50:17 AM

[dead]

by coalstartprob

5/17/2026 at 4:30:25 AM

This is much needed!

Compared to Codex CLI, Claude Code is insanely slow.

    $  time claude --version
    2.1.143 (Claude Code)

    ________________________________________________________

    Executed in    4.39 secs      fish           external
    usr time   29.68 millis    0.26 millis   29.41 millis
    sys time   71.30 millis    1.30 millis   70.00 millis

5 seconds to show me the version number!

I'm guessing Claude Code also needs a rewrite in Rust. But from what I saw in the leaked TypeScript code, a line-to-line port will be pretty bad. It requires a new architecture that matches Rust idioms

by mohsen1

5/17/2026 at 4:52:31 AM

Note that includes network requests to check latest version.

I suspect we'll soon see someone make a persistent Claude shell mode, with the reverse of a !, where you work in shell and send a message to Claude, and Claude sees all the context.

by nomel

5/17/2026 at 6:49:05 AM

What version of time is giving you that kind of output?

by marcosscriven

5/17/2026 at 10:52:26 AM

Looks like that time command was invoked from "fish" shell: https://fishshell.com/docs/current/cmds/time.html

by pramodbiligiri

5/17/2026 at 1:15:47 PM

I tried to install opencode on my x200 laptop, it would segfault as Bun wants some specific intel processor extensions (SIMD).

Now I tried to install zerostack, but the compilation freezes at a certain package.

Is there a static binary available for linux?

by zoobab

5/17/2026 at 6:44:46 PM

I finally managed to compile it, quite happy with the usage.

Will try to rebuild it with static flag.

by zoobab

5/17/2026 at 5:57:50 PM

Don’t get me wrong, but 7K LoCs means it is still an early attempt to make a coding agent. It starts easy “ah it can edit and read files!”, but it requires a lot of extra effort to make properly for many edge cases, especially caching, price optimizations, etc.

I’ve been implementing custom coding agent in https://playcode.io for 3 years already. Far beyond of 7K LoCs.

So when you compare to “shitty slow” Claude code - I don’t agree.

by ianberdin

5/17/2026 at 6:04:06 PM

Check what tools we already implemented, check your "slow" accusation, check the prompt system, check the provider integration (via Rig, so caching is already enabled), check the MCP support and other integrations that you don't even find on some major agents (git worktrees + loops).

For 3 years, your Lovable clone is something that Claude Code could make in a couple of days, but good luck shitting on other project I guess.

by gidellav

5/17/2026 at 9:08:29 AM

I’m also playing around with Rust for building agents—my setup ends up looking a lot like ZeroStack’s approach. If anyone’s curious, my project is here: https://github.com/7df-lab/devo

by tsiao1999

5/17/2026 at 10:58:35 AM

The screenshots in your readme all 404

by Fuzzwah

5/17/2026 at 2:15:20 PM

How would one create custom tools for it? opencode offers TS SDK for it, but with rust it will be something more heavyweight like gRPC bridge (similar to how terrafoem providers work).

by nopurpose

5/17/2026 at 5:48:13 AM

Looks interesting, how would you use skills with that? Would I need to migrate them into prompts? Which I think is not the same.

E.g. how to use official, vendor provided skills with zerostack? https://github.com/elestio/elestio-skill

by Phlogi

5/17/2026 at 6:02:40 AM

Technically, a skill is equivalent to adding

'"The skill description": if this applies, read /path/to/skill/definition.md'

To your agents.md

At least currently skills don't let you set the model (to my knowledge), so that's not a distinction either here (it would be with agent definitions)

by ffsm8

5/17/2026 at 12:30:27 AM

> Integrated Ralph Wiggum loops: looping capabilities for long-horizon tasks

Imo, this shouldn't be embedded in the executor layer. Orchestration should handle this.

by inciampati

5/17/2026 at 12:33:41 AM

I get you, but when I decided to follow a no-skills approach (as in, no agent's Skills used), I had to decide what:

1. Couldn't be built only using prompts

2. Couldn't be built only using MCP servers

3. Would have improved my UX experience (as i hope, your UX experience).

From those three conditions, I chose integrated git worktrees and loops

by gidellav

5/17/2026 at 4:07:36 AM

Is AI is the new Waterfall/Agile methodology with all the lingo/terminology/names that make no damn sense?

Appears so, because I am so turned off by it...

by qsera

5/17/2026 at 3:04:18 AM

Are agent harnesses the new web framework?

Everyone wants to write one, building a new one is easy to start with, but tough to get to “prod ready” and the landscape is littered with failed attempts?

Certainly feels like it.

This is really good though; works well and at least has a clearly articulated raison d'être.

by noodletheworld

5/17/2026 at 4:58:05 AM

The key thing with pi is that it can extend itself. How does that work when it’s written in rust?

by spectaclepiece

5/17/2026 at 12:31:18 PM

The usual way to make a Rust program extensible is to embed a wasm interpreter. Then the agent can extend it by writing an extension in Rust or any other language that compiles to wasm. Zed does it for example

by nextaccountic

5/17/2026 at 6:35:03 AM

That's a bit like saying "the key thing with Lisp is that it can extend itself." Yes, that is a core feature and a lot of people use it for that reason. But not everyone. Other use pi just because it is a small agent harness, but don't need (or don't want) the self-extensibility.

by adastra22

5/17/2026 at 5:00:42 PM

Are there any pre-built Linux binaries for this? I tried to install it with cargo, but got "feature `edition2024` is required" (which is the newest cargo available from my current Ubuntu distro).

Also, can I configure zerostack to always require a sandbox? I don't want to accidentally forget to call it with --sandbox.

by perlgeek

5/17/2026 at 7:29:04 AM

New to this. but whats the benefit over models like Claude code ?

by tedshark

5/17/2026 at 7:40:27 AM

Make harness independent of model, so when pricing or quality changes you can switch.

Avoid lock in to stack from one provider (things like a harness that only works with models from one provider and so on).

Use local models (a couple of them do work a bit now, if you have 20Gb video RAM), which saves money and is more private, and works offline.

Can improve the harness, fix bugs in it, make it compatible with different systems and techniques.

This game happens every time in new cycles of developer technology. The good bet historically has always been to use open source - there's a reason most developer tooling just pre-AI revolution was open source (even things like Java and .NET which used to be proprietary).

by frabcus

5/17/2026 at 10:54:02 AM

>Make harness independent of model

You can use Claude Code with almost any model.

>Use local models (a couple of them do work a bit now, if you have 20Gb video RAM), which saves money and is more private, and works offline.

You can do that with Claude Code.

by DeathArrow

5/17/2026 at 7:40:50 AM

Different harness (pi), but this blog post may partially answer your question: https://mariozechner.at/posts/2025-11-30-pi-coding-agent/

by timwis

5/16/2026 at 11:26:19 PM

Given agent harnesses affect so much of the performance of models, it would be great to see some kind of benchmark on how this tool performs compared to claude/codex/opencode/pi etc.

by sergiotapia

5/16/2026 at 11:46:46 PM

Hi! While I didn't try any agent benchmark, I already though of this possible issue, and I tried to approach it on two different levels:

1. The tools that are given to the agent are almost the same to the one defined in Opencode, except for Skills and Subagents (both features not implemented in zerostack)

2. Zerostack is prompt-based, so that it ships with a set of .md files, stored in ~/.config/zerostack/prompt, and that can be selected from the TUI in order to activate different 'agents': as you can see from the README, it is designed to contain the most important feautres of superpower + Claude's front-end design + git worktree support and Ralph Wiggum loops (both as integrated features)

by gidellav

5/17/2026 at 12:34:00 AM

It's been said before, but it is important to prospective users, so it bears repeating: screenshots and benchmarks, please; it helps users decide whether to invest time in it. The ability to transfer settings from other agents would be great too.

by esafak

5/17/2026 at 12:48:11 AM

1. I will add some screenshots tomorrow

2. As said before, there are no benchmarks right now, but it is good enough for me, so I hope it's good enough for y'all :)

3. Transfering settings from other agents is out-of-scope for a minimalstic coding agent, but the idea is that, apart from MCP server, the rest might just force you to learn how zerostack works, because of design choices such as not having Skills or having certain specialized tools integrated (worktrees and loops).

by gidellav

5/17/2026 at 2:16:12 AM

I absolutely like this. Pi becomes sluggish after installing a couple of extensions. I myself was trying to port Pi to Rust but it was consuming too much tokens.

Is there any API like Pi so that I can create extensions.

by theusus

5/17/2026 at 3:54:43 AM

It absolutely doesn't. It must be the extensions you're using.

I've found is that nearly every extension on the official pi.dev/packages is vibe coded trash, like for example the most popular subagents extension.

Instead of just giving you a basic subagent, it's a whole kitchen sink of recursion, teams, chains, confusingly named agents like "oracle" etc. Basically feels like someone kept prompting "what else could we add here?".

They're all like that. It's no wonder these slow down pi.

What I've done is just have the agent write my own.

Get a local copy of e.g. that kitchen sink subagents extension. Have the agent list all the features, then I give back a much smaller list of the features I want and say "write me a new extension with just these new features" and every time it one shots it (using GPT 5.3 usually), then 20-30 minutes later I have a working, lightweight extension tuned to my exact workflow.

I've done this for I guess about 8 extensions now (subagents, a lightweight typescript LSP, web search, background processes, Claude style hooks, plan mode are the main ones) and it's very fast and snappy.

by esperent

5/17/2026 at 4:40:29 AM

Still they are maintained by those developers. I cannot spend my time developing extensions. I'd rather do that in Rust.

by theusus

5/17/2026 at 5:27:39 AM

Then pi is probably not for you, as doing this is pretty much the whole selling point. You could try oh-my-pi or OpenCode instead.

by esperent

5/17/2026 at 8:19:35 AM

These simple harnesses perform the best in my day to day experience but I sitll can't figure out why that's the case.

by 0xAstro

5/17/2026 at 8:26:57 AM

Because they don’t have an incentive to maximize your usage, but rather focus on solving probabilistic solvable problems for you.

Bigger harnesses need to balance upping your token usage and being helpful.

by jwpapi

5/17/2026 at 1:08:08 PM

How is it any faster than something written in anyother programming languages?

by eddy-sekorti

5/17/2026 at 6:19:45 AM

Hmm, Claude Code and Opencode work fine for me.

It's a bit amusing that coding agents rely on drawing 1000W+ and using 2TB+ of memory in a datacenter to run, yet people really focus on the last few watts and few hundred megabytes of memory on their laptop (which get dwarfed by the energy cost of compiling their code anyways). But I suppose making them a bit faster and lighter wouldn't hurt.

by 2001zhaozhao

5/17/2026 at 6:41:17 AM

The data centre runs on a dedicated power line. My laptop runs on battery. Using coding agents currently drains battery quite fast, which is surprising, given that the vast majority of the work does not take place on my laptop.

Making the client side coding agent more efficient isn't about saving the climate. It is about extending the workday (which might actually make the climate worse)

by kvdveer

5/17/2026 at 6:42:41 AM

I think this is overly reductive. For sure the models are behemoths and consume a lot of resources, but the harness can have a big impact on how much the model is used. For example, having a strong set of tools available in the harness means the model can work much more efficiently.

by remus

5/17/2026 at 6:58:06 AM

It is also just an indicator of the planning and polish that a particular harness may have.

by NewJazz

5/17/2026 at 6:35:16 AM

[dead]

by huflungdung

5/17/2026 at 9:23:01 AM

Could we finally put the whole "written in pure Rust" thing as if it is a certificate of quality to rest? You can write crap in Rust, you can write excellent software in Rust, and both goes for all other languages too. I don't care what language you used for a project from the quality POV. Slop is slop, no matter Rust or JS or C.

by teiferer

5/17/2026 at 6:16:14 AM

Sorry, it looks like we were not able to load the page. Please make sure your network connection works and you are using an up-to-date browser. If the issue persists, please visit our issue tracker to report the problem

Got this on iPhone firefox

by born-jre

5/17/2026 at 6:38:11 AM

Retry from Safari, sometimes it works better

by gidellav

5/17/2026 at 2:08:41 AM

I love these. Coding agents aren't very difficult to build, it's a TUI + tools + getting a nice agent loop working. The hardest part seems to be supporting all of the different providers and model quirks. What is interesting is seeing the experimentation: some provide tons of tools, others provide a single python interpreter and have the agent use tools via sandboxed python scripts, others use minimal tools and lean on bash. Personally I want a harness that gives a ton of control to the user to let them steer the LLM, less agent and more augmentation. Maybe I'll have to build it myself. If anyone has ideas, let me know.

by slopinthebag

5/17/2026 at 5:49:16 AM

I'm working on one right now where nearly everything can be expressed as a combination of workflows. There will be some built-in agent types out of the box but all the Lego pieces are there if you want to put together something different.

by inhumantsar

5/17/2026 at 6:35:25 AM

What language are you building this in? I’m interested but trying to stay away from js world for security reasons.

by michalsustr

5/17/2026 at 3:57:36 PM

The system and plugins are Rust. Workflows can be defined in a plugin with Rust or externally with YAML.

Might add support for custom WASM plugins down the road, but everything shipped with the system will be Rust.

by inhumantsar

5/17/2026 at 3:51:26 AM

Pi.dev is pretty good in giving tons of control to the use and has extensions that you can easily build.

Although people are complaining about its RAM usage in this thread, I haven't bothered to check how much RAM it uses.

by afzalive

5/18/2026 at 12:25:12 AM

I refuse to run npm slop on my hardware

by slopinthebag

5/17/2026 at 1:44:16 AM

Now make it into an IntelliJ plugin which has proper access to the search index. I’ll pay for it. For Christs sake it’s insane JetBrains hasn’t figured this out yet

by usernametaken29

5/17/2026 at 6:39:12 AM

I am currently deciding on adding ACP support or not (and ACP support should allow connections to JetBrains's IDEs)

by gidellav

5/17/2026 at 9:52:44 AM

Yes please.

TUIs are cool but sometimes people prefer staying in the IDE

by upcoming-sesame

5/17/2026 at 2:24:32 AM

I think this is such an opportunity for JetBrains. I talked to them about this at AWS Re-Invent, strangely, they could really see how strong of a position they are in if only they paid attention to the right thing!

by nullorempty

5/17/2026 at 3:52:11 AM

They even have this already, Junie, but of course the plugin version cannot use BYOK….

by usernametaken29

5/17/2026 at 4:02:43 AM

Jetbrains does not have their own IDE-integrated coding agent?

What do Jetbrains users use then? Amp?

by kirtivr

5/17/2026 at 12:53:15 PM

What is the use case for integrating coding agent in IDE?

I use run agents outside of my IDE, while they work I can look at the code they created, or I can us IDE to do different work.

by krzyk

5/17/2026 at 4:38:35 AM

https://www.jetbrains.com/junie/

by sgarman

5/17/2026 at 5:10:19 AM

Junie does not support BYOK inside the IDE

by usernametaken29

5/17/2026 at 7:39:31 AM

Has this position recently changed? It states this on the marketing page?

> Use a JetBrains AI subscription or connect your preferred provider with Bring Your Own Key (BYOK).

by leonsmith

5/17/2026 at 10:16:06 AM

It seem confusing. My understanding is the AI assistant part (i.e. chat) is configurable. But Junie IDE is only via credits through Jetbrains.

https://youtrack.jetbrains.com/articles/SUPPORT-A-1833/What-...

(To make it more confusing, Junie CLI seems to say it will any provider)

by Ardren

5/17/2026 at 11:01:09 AM

[flagged]

by PythonLuvr

5/17/2026 at 7:38:44 AM

What does the k stand for? Key?

You can add any open Ai api endpoint you want, no?

by Mashimo

5/17/2026 at 11:26:12 AM

No, you have to buy their subscription within the IDE

by usernametaken29

5/17/2026 at 1:45:57 PM

The JetBrains AI Assistant plugins says:

> Choose how AI runs by selecting built-in AI models from top-tier providers, bringing your own API keys or connecting local models.

And the AI Assistant in turn can use Junie.

At least that is what the plugin overview says, I have not tested it.

by Mashimo

5/17/2026 at 5:21:04 AM

Does the IntelliJ mcp server do that? It has find tools

by dtauzell

5/17/2026 at 9:16:12 AM

what "unix-inspired" here means?

by rw_panic0_0

5/17/2026 at 3:55:23 AM

Looks promising, is OpenAI subscription support planned?

by deagle50

5/16/2026 at 11:18:38 PM

this is what I've been waiting for

a low level language. please no more scripting language TUIs!

by hparadiz

5/16/2026 at 11:52:16 PM

Rust, a language with affine types, generics, lifetimes, deep static analysis, hygienic macros, etc is not low-level. It's nearly as high-level as Haskell (without HKTs though).

It just does not rely on GC and allows to manage resources efficiently. This efficiency is partly due to its being so high-level.

by nine_k

5/16/2026 at 11:57:52 PM

While I agree on the fact that it allows to manage resources efficiently, I don't agree on the fact the efficency derives from it being high-level; from a purely tecnical standpoint, i could skim off 2-3MB from the memory footprint by writing the code in pure C, as there are some unused parts of Rust's std that cannot be removed without recompiling std.

This is obv only a technical talk, as writing an AI TUI in pure C would be rather... ehhh

by gidellav

5/17/2026 at 1:02:42 AM

That's why I said "part of its efficiency". Rust can do RAII, can optimize things more aggressively because of no aliasing ever in safe code, and because of known lifetimes, it can offer fearless concurrency™. Rust can also support highly optimized data representations (see how Optional works, or other ADTs, etc) which languages like Haskell, to say nothing of Python, cannot offer because of GC and boxing.

Lower-level languages like Zig or even Go, to say nothing of C, lack many of the high-level language features that power this efficiency.

by nine_k

5/17/2026 at 3:05:25 AM

Agreed, Rust is way more expressive than people give it credit for.

by onlyrealcuzzo

5/16/2026 at 11:32:11 PM

There has been no reason to wait... Codex is written in rust.

-- So is deepseek-tui.

by schaefer

5/16/2026 at 11:35:35 PM

Forgot to add an open source qualifier. I use codex lol

by hparadiz

5/16/2026 at 11:37:59 PM

Codex is also opensource.

by andxor

5/16/2026 at 11:53:57 PM

I don't really want something owned by a company for my local stuff. I'd prefer it be small and minimalistic. Maybe in the future I'll change my mind and it will be more like a browser but for now I wanna keep it small and local.

by hparadiz

5/17/2026 at 12:13:58 AM

Thanks! I don't think that the only advantages are being open and lightweight, but you can actually find some more interesting features such as Ollama support, integrated Prompts (in order to compete with superpowers), git worktrees integration, and so on

by gidellav

5/16/2026 at 11:28:24 PM

Isn’t codex in rust?

by iknowstuff

5/17/2026 at 4:04:01 AM

yes.

by rvz

5/17/2026 at 7:56:27 AM

How come the official codex install instructions say use npm install?

(I just rebuilt my sandbox vm a few days ago….)

Or are there two separate codex clients?

https://developers.openai.com/codex/cli

by cyberpunk

5/17/2026 at 5:12:51 PM

The one from npm is signed by OpenAI, which means computer use from the CLI. The brew distribution requires using the Codex app for computer use.

Thanks Apple.

by nicoritschel

5/17/2026 at 12:51:47 PM

Because people are crazy, usage of npm for installing binaries is quite common unfortunately.

by krzyk

5/17/2026 at 1:55:43 PM

So …. do I understand it right? openai, one of the hottest companies on the planet right now, with very deep pockets, distribute their official rust cli via the … public npm repo?

by cyberpunk

5/17/2026 at 2:20:25 PM

Yes.

There is also homebrew install.

by krzyk

5/17/2026 at 11:56:45 AM

omfg stop

nobody actually cares about rust, let alone likes it

by icase

5/17/2026 at 2:56:58 AM

dude, im actually in disbelief how long we put up with the pile of shit that is claude code.

by choopachups

5/17/2026 at 9:43:28 PM

No extensions? I think you've missed the point

by NamlchakKhandro

5/17/2026 at 4:33:54 AM

This may be the most HN post I have ever seen.

by tencentshill

5/17/2026 at 6:41:32 AM

IMO, the problem with Claude Code, OpenCode, Pi is the harness quality and convincing the agents to do the exact things you need, to define workflows and make the agents stick to it. I didn't experience performance issues.

For example I have an agent in Claude Code that has strict rules to do something before implementing every phase in the plan. Sometimes it decides not to do it. "But, wait the feature is simple enough so I can proceed straight to implementation..."

Just because this is written in Rust won't solve the biggest issues most users have with coding agents.

by DeathArrow

5/17/2026 at 8:45:00 AM

But that‘s not an issue with the coding agent. It’s the model that doesn’t follow the instructions.

Given how an LLM works, you can never be sure it will always work. LLMs are not deterministic.

by bhaak

5/17/2026 at 10:55:19 AM

Isn't a harness supposed to guide and steer yhe coding agent?

by DeathArrow

5/17/2026 at 12:27:46 PM

While the harness can block certain actions (e.g., tool usage), it can’t enforce perfect adherence to instructions because the model itself is probabilistic. The harness can reduce deviations, but it can’t eliminate the fundamental unpredictability of LLMs.

The rules that are fed into the AI are not unbreakable laws to the AI. We should always remember that.

by bhaak

5/17/2026 at 6:33:42 AM

How does this do in SWE-Bench Pro and Terminal Bench?

by DeathArrow

5/17/2026 at 4:47:59 AM

Does anyone use claude with custom agents? IIRC they banned the use, and only allow claudes own agent.

by phplovesong

5/17/2026 at 4:55:32 AM

You can use Claude with other harnesses at API costs, but you cannot use it with your Claude Code sub. That's changing next month though, I guess https://support.claude.com/en/articles/15036540-use-the-clau...

by shepherdjerred

5/17/2026 at 6:46:01 AM

I use Claude Code with GLM 5.1, MiniMax M2.7, Kimi K2.6 and Xiaomi MiMo V2.5 Pro.

by DeathArrow

5/17/2026 at 4:23:22 AM

As you can see, writing a coding agent in a compiled language makes a ton of sense and gives the benefits of running multiple agents efficiently instead of running into leaks and tools consuming gigabytes of RAM.

by rvz

5/17/2026 at 10:35:56 AM

That makes no sense, coding harness are just subprocess wrappers + http calls. What is the benefit if at the end of the day it will spawn make,cmake,python,node.js, or whatever the developer is working on? With the enormous downside of loosing native/easy extensibility, JavaScript Object Notation (JSON) is derived from JavaScript, it seamlessly parses and dumps.

by _user_account

5/17/2026 at 2:11:33 PM

hmm

by anuis258

5/17/2026 at 4:19:14 AM

the war of the coding agents has begun.

by joeyguerra

5/17/2026 at 10:03:10 AM

woo hoo, more ai slop...

by kapija

5/17/2026 at 5:49:39 AM

Worth noting the "Unix-inspired" framing is the HN title, not the README — the project itself pitches "minimalistic" and "optimized for memory footprint." Curious what the author means by Unix-inspired specifically, since a single-binary TUI running a multi-tool agent loop doesn't immediately read as do-one-thing-well-and-compose.

by obaid

5/17/2026 at 9:18:53 AM

[flagged]

by Sim-In-Silico

5/17/2026 at 4:25:08 AM

[flagged]

by sarim

5/17/2026 at 3:50:11 PM

[dead]

by LuminaNAO

5/17/2026 at 1:19:36 PM

[flagged]

by shrmarahul

5/17/2026 at 8:05:28 AM

[dead]

by kuanghs

5/17/2026 at 1:59:27 AM

[flagged]

by edgardurand

5/17/2026 at 2:03:24 AM

[flagged]

by phoebe_builds

5/17/2026 at 10:12:06 AM

[flagged]

by amys94fr

5/17/2026 at 12:26:29 AM

[flagged]

by artem_am

5/17/2026 at 9:17:57 PM

Another one. Cool, cool.

/s

by IndianAISupport

5/17/2026 at 12:22:21 AM

[dead]

by nimchimpsky

5/17/2026 at 1:35:10 AM

[flagged]

by andrew_kwak

5/17/2026 at 1:22:47 AM

!RemindMe 6 months

by brcmthrowaway

5/17/2026 at 10:08:24 AM

This is awesome! can't wait to see where it goes as it continues development

Always funny how Hacker News works with traction, posted about a rust based TUI agent I'm working on a couple days ago too :P

https://github.com/Kuberwastaken/claurst

by kuberwastaken

5/17/2026 at 9:13:37 AM

There is also https://github.com/Dicklesworthstone/pi_agent_rust

I vibed a comparison/review of these two systems using my llm wiki: https://zby.github.io/commonplace/work/pi-agent-zerostack-co...

(the prompt is in https://zby.github.io/commonplace/work/pi-agent-zerostack-co...)

by zby

5/17/2026 at 9:19:11 AM

Your bot seems to think that `pi_agent_rust` is the same as upstream Pi.

by cassianoleal

5/17/2026 at 9:27:54 AM

I think I fixed this in a later revision. Does that persist?

by zby