6/18/2026 at 6:17:52 PM
I am glad articles like this are finally starting to get some momentum around what I call the LLM magic box industry. From caveman mode to RTK to semantic search and everything in between. Developers have become magicians that cast spells instead of engineers. It sucks at work especially with everyone so sure that their magic spell is the one for ultimate token savings.My criteria are: if it’s not in a harness it’s probably not that good (the best ideas float up to Codex/Claude imo) and any GitHub advertising some percent of token savings is not to be trusted.
It’s hard to avoid the snake oil and I hope people start thinking critically on this stuff.
by cityofdelusion
6/18/2026 at 9:01:41 PM
Totally wrong, you underestimate the frontier's incompetence in anything other than building LLM models (ehm ehm flickering TUI for a year "written like a game engine").I ran a bunch of benchmarks and there are proven ways to reduce tokens while achieving the same results (finding the same CVEs / finding the same bugs in CRs, etc...).
See https://maki.sh, it's my own little proof.
by tontinton
6/19/2026 at 7:20:55 AM
Downloaded it and giving it a try. I love the aesthetics and how colorful it is! It is also pleasant to use and fast, although my personal preference is TUIs that don't break scrollback (claude code actually does a good job at this).by ChadNauseam
6/19/2026 at 6:54:30 AM
Tried it just now. The onboarding process could be better, for example guide user to pick the available models if providers is setup but it's not anthropic. Wasted a bit of time foguring out that the provider is detected, just he models was wrong.But I like it, code processing is freaking fast.
by aaulia
6/18/2026 at 11:38:32 PM
an agent harness built in rust with ratatui - checks out. i've built one myself. i don't maintain it, and continue to use opencode, but it was worth it to learn how agent harnesses work.anyway, what's the real pitch on why i should move on from opencode to maki?
by dfee
6/19/2026 at 7:27:32 AM
> what’s the real pitch on whyI’m not OP, but parent comment and linked site https://maki.sh talk about token reduction.
by no-name-here
6/19/2026 at 4:43:30 AM
I just tried it. It is awesome!Can you add an indicator to show whether the tool is currently running or not running (due to - no prompt, API error, waiting for permission etc)
by saaspirant
6/18/2026 at 9:40:39 PM
What is your approach for reducing token usage and is it different than rtk?by lackoftactics
6/18/2026 at 9:55:59 PM
The biggest ones are: using tree-sitter to index code files as a tool, code_execution tool running a workflow of tools inside a python interpreter (monty), and not being a harness developed by the company profiting from selling you the shovels (and introducing "dynamic workflows" aka spawning 50 agents).by tontinton
6/18/2026 at 10:46:03 PM
Tbh I could buy into what you are proposing more than rtk. It feels sane in comparisonby lackoftactics
6/19/2026 at 1:54:08 AM
Looks very cool. I would like to try it, but don't want to use API billing. OpenAI I think would allow it to use account login. Would you support that?by raylad
6/19/2026 at 4:44:27 AM
Not OP It is already supported via Codex auth. Please run `maki auth login openai`by saaspirant
6/18/2026 at 9:24:31 PM
Maki is awesome. Thanks! I'm using it on my X220 and it flies in comparison to OpenCode et al.by rescbr
6/18/2026 at 9:53:29 PM
Enjoy, I can't go back to other agents now, too spoiled by the speedby tontinton
6/18/2026 at 10:56:22 PM
I also have become a maki convert and I really like it. I ran into an issue with the dynamic model provider that I should probably make a patch for; list_models doesn’t use the `<provider> models` output at all but instead tries to look up `<provider> resolve`’s base URL + /v1/models, which breaks on a provider like Z.ai which doesn’t have /v1/ anywhere in the path…by 0xc133
6/19/2026 at 7:16:13 AM
Create an issue! :)by tontinton
6/18/2026 at 10:16:48 PM
Wait what? I thought Maki is a Pi/OpenCode replacement i.e. just the TUI for whatever you plug in it i.e. API-based Codex / Claude?In another comment you said "I can't get back to other agents". What gives? Feels like I completely misunderstood what Maki is.
by pdimitar
6/18/2026 at 10:23:12 PM
It's a TUI you're right, but it's also a harness.As much as I hate to admit, T the tools you provide, the descriptions, and prompts, all amount to pretty big changes in experience, even using the same models.
by tontinton
6/18/2026 at 10:27:29 PM
That didn't help very much. What did you mean by "agents" earlier on? The tool/harness or the LLM itself?Also -- can you make Maki enforce the underlying LLM to use stuff like fd/rg and not always default to find/grep, for example? And stop trying to do bash-isms in a zsh system?
by pdimitar
6/19/2026 at 7:21:49 AM
I meant harnesses / TUIs, sorry to confuse.Using fd/rg sounds interesting, honestly it would require little tweaks to the bash tool lua plugin, either add to the description to prefer these binaries instead or something like that.
In general though I much prefer "advising" and encouraging the LLM to use the native tools like grep l/glob, they are implemented to be super fast, and you will get better parser output.
by tontinton
6/18/2026 at 7:58:56 PM
This is why I Blind A/B test everything.I burn a ton of tokens, but things actually have to prove their value. And the vast majority of things do not come close to doing so.
I have my own AI agent full of stuff. I blind A/B test everything, but I also don't think the results are all that useful as a signal to others.
Just because I Blind A/B test it 4 months ago, it's maybe not meaningful today.
Maybe the word choices I use dramatically impact things.
I do it, because I can prove the value, and see it with my own eyes. I don't even bother publishing the specific Blind A/B tests.
Also, I've seen other people try to Blind A/B test and get it very wrong. If your measurements aren't good, the test is meaningless.
I don't know. We're all working on these problems together. There's a lot of black magic (which is why I rely on hooks a lot). I'm sure I have tons of black magic, I have a large little AI Agent.
But what I know for certain, is it works for me. All it takes is for me to not use it, and I honestly don't know how everyone currently works with AI.
I will link it, but it is not an endorsement for what you do. Mostly only other software engineers use it. And it's so very specific to the things I have to do.
At best, maybe it sparks an idea for you to implement on your own.
by AndyNemmity
6/18/2026 at 9:41:01 PM
>the best ideas float up to Codex/Claude imoThey only float up if people create things like RTK and other people try them though.
It's fair to sit this one out and let others figure out if it's worth it or not but tools like RTK, Headroom, caveman mode and others do reduce input and output tokens that need to be processed, and for local LLMs that can have measurable speedups. Whether or not that ultimately hurts the resulting output I don't have enough data to say, but I am happy to play with them to find out.
by evilduck
6/18/2026 at 9:52:38 PM
Also the incentives aren’t exactly aligned. Yes, Anthropic et. al want you to have efficient token usage (because you’ll use it more, and because of some competitive pressure). But it’s not their first priority especially when they make more money with more tokens.If a tool like rtk improves token efficiency, but has some negative impact on quality, should Anthropic integrate it immediately? Where is the line? This kind of decision is arguably better left to the user.
What they should maybe do, is have a parameter similar to effort level, that allows the user to opt into native features for token minimizing. Make the tools available but leave the choice of the fidelity/savings tradeoff up to the user.
by chatmasta
6/18/2026 at 6:23:41 PM
The idea itself is sound: If you can reduce the signal-to-noise ratio in the context window, then that's a good thing.Whether or not RTK actually does this has not been established. I would be glad to see some proper benchmarks done on the actual difference this tool makes (not some meaningless "up to 90%" type of language).
by arcanemachiner
6/19/2026 at 5:24:31 AM
I found this, which has some: https://arxiv.org/pdf/2605.28876 TLDR: RTK does not look good according to the author's benchmark.by celrod
6/18/2026 at 6:26:00 PM
I was wondering if that impacts the accuracy, obviously the rtk output wasn't in the training dataset, but maybe it doesn't matter at the endby lackoftactics
6/18/2026 at 7:58:46 PM
I'll go further and note that some of the optimizations I've seen in rtk for things like `git status` have actually bubbled up into the model layer -- Codex is regularly making tool calls like `git status --short` instead of `git status`.by philipbjorge
6/18/2026 at 6:20:29 PM
There is a conflict of interest, though.by blubber
6/19/2026 at 4:53:05 AM
Only in inference, but if you consider that they’re reinvesting inference performance in training I think the conflict argument is overblown.by baq
6/18/2026 at 7:33:47 PM
I have to say I made a similiar mistake with trusting semantic search is the next big thing. My opinion shifted, but it made sense for me for too longby lackoftactics
6/18/2026 at 9:44:42 PM
But Claude especially copy opensource ideas after they are widely used for monthsby mingqiz
6/18/2026 at 9:24:24 PM
Oh, this gold rush has breathed new life into the old school Semantic guys.Lord knows the DITA priesthood has been running low on rubes, so this new era is a godsend.
Re-coding all of your org's content into a verbose granular schema, that's what will fix these AI things. It's going to give your LLM superpowers! Semantic superpowers!
While everyone completely ignores the utter lack of coupling between the actual language and whatever nonsense is in the element / structure naming. Or the fact that every single thing has to go through some horrible 1990s era parser, which breaks constantly, and now everyone's shovelling the full markup into the very tiny confused mouth of the AI. Or that now everyone needs specialized software to display anything. Or the everything.
My dudes, the thing you're trying to do with this stuff is already done in the vectorizations. You can use math for a lot of it now, instead of someone hand coding "poplar" as "tree" in a totally flat tree structure.
by lopsotronic
6/19/2026 at 7:45:53 AM
My criteria is "do they measure performance, or at least even try to?". Caveman [1], RTK [2] and more recently ponytail [3] don't or use a few trivial tests. Those projects don't measure performance on widely used benchmarks (like SWE Pro and stuff), that have their issues but at least it would give some indication. They also don't measure "big model + caveman vs smaller model".I've had a few times where removing all custom instructions that I started using with model N-2 made model N perform way better, so I'm very suspicious of everything that changes how the model works, it's easy to get degraded performance silently and suddenly you're paying latest Opus costs for 6 months old Sonnet performance.
[1]: https://github.com/JuliusBrussee/caveman
by Zababa
6/18/2026 at 7:07:49 PM
I mean it kind of already is in harnesses. Codex and Claude Code both have subagent tools. You could probably get a similar token output cut just by asking Claude Code to run all commands with Haiku as a summarizing subagent.by striking