LLM=True | alt.hn

2/25/2026 at 10:35:53 AM

> Then a brick hits you in the face when it dawns on you that all of our tools are dumping crazy amounts of non-relevant context into stdout thereby polluting your context windows.

I've found that letting the agent write its own optimized script for dealing with some things can really help with this. Claude is now forbidden from using `gradlew` directly, and can only use a helper script we made. It clears, recompiles, publishes locally, tests, ... all with a few extra flags. And when a test fails, the stack trace is printed.

Before this, Claude had to do A TON of different calls, all messing up the context. And when tests failed, it started to read gradle's generated HTML/XML files, which damaged the context immensely, since they contain a bunch of inline javascript.

And I've also been implementing this "LLM=true"-like behaviour in most of my applications. When an LLM is using it, logging is less verbose, it's also deduplicated so it doesn't show the same line a hundred times, ...

> He sees something goes wrong, but now he cut off the stacktraces by using tail, so he tries again using a bigger tail. Not satisfied with what he sees HE TRIES AGAIN with a bigger tail, and … you see the problem. It’s like a dog chasing its own tail.

I've had the same issue. Claude was running the 5+ minute test suite MULTIPLE TIMES in succession, just with a different `| grep something` tacked at the end. Now, the scripts I made always logs the entire (simplified) output, and just prints the path to the temporary file. This works so much better.

by skerit

2/25/2026 at 12:53:33 PM

> Claude is now forbidden from using `gradlew` directly, and can only use a helper script we made. It clears, recompiles, publishes locally, tests, ... all with a few extra flags. And when a test fails, the stack trace is printed.

I think my question at this point is what about this is specific to LLMs. Humans should not be forced to wade through reams of garbage output either.

by majewsky

2/25/2026 at 1:18:04 PM

Humans have the ability to ignore and generally not remember things after a short scan, prioritize what's actually important etc. But to an LLM a token is a token.

There's attempts at effectively doing something similar with analysis passes of the context - kinda what things like auto-compaction is doing - but I'm sure anyone who has used the current generation of those tools will tell you they're very much imperfect.

by kimixa

2/25/2026 at 2:53:33 PM

The “a token is a token” effect makes LLMs really bad at some things humans are great at, and really good at some things humans are terrible at.

For example, I quickly get bored looking through long logfiles for anomalies but an LLM can highlight those super quickly.

by pennomi

2/25/2026 at 4:32:06 PM

Isn’t the purpose of self attention exactly to recognize the relevance of some tokens over others?

by dcrazy

2/25/2026 at 6:05:56 PM

That may help with tokens being "ignored" while still being in the context window, but not context window size costs and limitations in the first place.

by kimixa

2/25/2026 at 8:49:41 PM

[dead]

by melecas

2/25/2026 at 7:28:37 PM

In my experience, it's the old time-invested vs time-saved trade off. If you're not looking at these reams of output often enough, the incentive to figure out all the flags and configs for verbosity to write these script is lower: https://xkcd.com/1205/

And because these issues are often sporadic, doing all this would be an unwanted sidequest, so humans grit their teeth and wade through the garbage manually each time.

With LLMs, the cost is effectively 0 compared to a human, so it doesn't matter. Have them write the script. In fact, because it benefits the LLM by reducing context pollution, which increases their accuracy, such measures should be actively identified and put in place.

by keeda

2/25/2026 at 1:09:21 PM

Lots of tools have a --quiet or --output json type option, which is usually helpful

by adammarples

2/25/2026 at 10:59:06 AM

The way I've solved this issue with a long running build script is to have a logging scripts which redirects all outputs into a file and can be included with ``` # Redirect all output to a log file (re-execs script with redirection) source "$(dirname "$0")/common/logging.sh" ``` at the start of a script.

Then when the script runs the output is put into a file, and the LLM can search that. Works like a charm.

by ViktorEE

2/25/2026 at 10:40:44 AM

This has been my exact experience with agents using gradle and it’s beyond frustrating to watch. I’ve been meaning to set up my own low-noise wrapper script.

This post just inspired me to tackle this once and for all today.

by quintu5

2/25/2026 at 3:53:03 PM

[dead]

by co_king_5

2/25/2026 at 11:06:45 AM

Wow, I'd love to do this. Any tips on how to build this (or how to help an LLM build this), specifically for ./gradlew?

by petedoyle

2/25/2026 at 3:23:12 PM

How is it forbidden? I tell agents to use my wrappers in AGENTS but they ignore it half the time and use the naked tool.

by esafak

2/25/2026 at 4:26:09 PM

If you get desperate, I've given my agent a custom $PATH that replaces the forbidden tools with shims that either call the correct tool, or at least tell it what to do differently.

~/agent-shims/mvn:

    #!/bin/bash
    echo "Usage of 'mvn' is forbidden. Use build.sh or run-tests.sh"

That way it is prevented from using the wrong tools, and can self-correct when it tries.

by Squid_Tamer

2/25/2026 at 3:42:52 PM

Permissions scoping

by simsla

2/25/2026 at 3:45:33 PM

Then they attempt to download the missing tool or write a substitute from scratch. Am I the only one who runs into this??

by esafak

2/25/2026 at 9:53:39 AM

> Then a brick hits you in the face when it dawns on you that all of our tools are dumping crazy amounts of non-relevant context into stdout thereby polluting your context windows.

Not just context windows. Lots of that crap is completely useless for humans too. It's not a rare occurrence for warnings to be hidden in so much irrelevant output that they're there for years before someone notices.

by lucumo

2/25/2026 at 11:22:14 AM

The old unix philosophy of "print nothing on success" looks crazy until you start trying to build pipes and shell scripts that use multiple tools internally. Also very quickly makes it clear why stdout and stderr are separate

by jaggederest

2/25/2026 at 12:48:43 PM

Also becomes rapidly apparent that most modern tooling takes an extremely liberal view of logging levels. The fact that you’ve successfully processed a file is not INFO, that’s DEBUG.

by sgarland

2/25/2026 at 2:27:48 PM

"Finished conversion of xyz.mp3 to xyz.ogg" is valuable progress information to a regular user, not just to developers, so it belongs in INFO, not DEBUG.

by adwn

2/25/2026 at 5:17:23 PM

I suppose this is subjective, but I disagree. If I want to know the status of each item, I’d pass -v to the command. A simple summary at the end is sufficient; if I pass -q, I expect it to print nothing, only issuing a return code.

by sgarland

2/25/2026 at 1:46:09 PM

It never felt crazy to me, with the exception that are many situations where having progress and diagnostic information (usually opt-in) made sense.

I guess it comes down to a choice of printing out only relevant information. I hate noisy crap, like LaTeX.

by titzer

2/25/2026 at 11:37:45 AM

Yeah. Maybe we only need:

   BATCH=yes    (default is no)

   --batch   (default is --no-batch)

for the unusual case when you do want the `route print` on a BGP router to actually dump 8 gigabytes of text throughout next 2 minutes. Maybe it's fine if a default output for anything generously applies summarization, such as "X, Y, Z ...and 9 thousand+ similar entries".

Having two separate command names (one for human/llm, one for batch) sucks.

Having `-h` for human, like ls or df do, sucks slightly less, but it is still a backward-compatibility hack which leads to `alias` proliferation and makes human lifes worse.

by kubanczyk

2/25/2026 at 9:39:54 AM

Something related to this article, but not related to AI:

As someone who loves coding pet projects but is not a software engineer by profession, I find the paradigm of maintaining all these config files and environment variables exhausting, and there seem to be more and more of them for any non-trivial projects.

Not only do I find it hard to remember which is which or to locate any specific setting, their mechanisms often feel mysterious too: I often have to manually test them to see if they actually work or how exactly. This is not the case for actual code, where I can understand the logic just by reading it, since it has a clearer flow.

And I just can’t make myself blindly copy other people's config/env files without knowing what each switch is doing. This makes building projects, and especially copying or imitating other people's projects, a frustrating experience.

How do you deal with this better, my fellow professionals?

by thrdbndndn

2/25/2026 at 9:57:23 AM

Software folks love over-engineering things. If you look at the web coding craze of a few years ago, people started piling up tooling on top of tooling (frameworks, build pipelines, linting, generators etc.) for something that could also be zero-config, and just a handful of files for simple projects.

I guess this happens when you're too deep in a topic and forget that eventually the overhead of maintaining the tooling outweights the benefits. It's a curse of our profession. We build and automate things, so we naturally want to build and automate tooling for doing the things we do.

by blauditore

2/25/2026 at 10:58:00 AM

I don’t think those web tooling piles are over-engineered per se, they address huge challenges at Google and Facebook, but the profession is way too driven by hype and fashion and the result is a lot of cargo culting of stuff from Big Dogs unquestioningly. Wrong tooling for the job creates that bubble of over complicated app development.

Inventing GraphQL and React and making your own PHP compiler are absolutely insane and obviously wrong decisions — for everyone who isn’t Facebook. With Facebook revenue and Facebooks army of resume obsessed PHP monkeys they strike me as elegant technological solutions to otherwise intractable organizational issues. Insane, but highly profitable and fast moving. Outside of that context using React should be addressing clear pain points, not a dogmatic default.

We’re seeing some active pushback on it now online, but so much damage has been done. Embracing progressive complexity of web apps/sites should leave the majority as barebones with minimal if any JavaScript.

Facebook solutions for Facebook problems. Most of us can be deeply happy our 99 problems don’t include theirs, and live a simpler easier life.

by bonesss

2/25/2026 at 2:34:32 PM

Not sure why you lumped React in there. Hack is loopy, and GraphQL was overhyped but conditionally useful, but React was legitimately useful and a real improvement over other ways of doing things at the time. Compare React to contemporary stuff like jQuery, Backbone, Knockout, Angular 1.x, etc.

by CuriouslyC

2/25/2026 at 4:38:10 PM

I agree with you very much, if what you are building actually benefits from that much client side interactivity. I think the counterpoint is that most products could be server rendered html templates with a tiny amount of plain js rather than complex frontend applications.

by rokob

2/25/2026 at 9:58:27 AM

First of all, I read the documentation for the tools I'm trying to configure.

I know this is very 20th century, but it helps a lot to understand how everything fits together and to remember what each tool does in a complex stack.

Documentation is not always perfect or complete, but it makes it much easier to find parameters in config files and know which ones to tweak.

And when the documentation falls short, the old adage applies: "Use the source, Luke."

by dlt713705

2/25/2026 at 10:06:56 AM

> As someone who loves coding pet projects but is not a software engineer by profession, I find the paradigm of maintaining all these config files and environment variables exhausting

Then don’t.

> How do you deal with this better, my fellow professionals?

By not doing it.

Look, it’s your project. Why are you frustrating yourself? What you do is you set up your environment, your configuration, what you need/understand/prefer and that’s it. You’ll find out what those are as you go along. If you need, document each line as you add it. Don’t complicate it.

by latexr

2/25/2026 at 9:57:13 AM

Honestly... ask an AI agent to update them for you.

They do an excellent job of reading documentation and searching to pick and choose and filter config that you might care about.

After decades of maintaining them myself, this was a huge breath of fresh air for me.

by tekacs

2/25/2026 at 10:04:25 AM

Don't fall for the "JS ecosystem" trap and use sane tools. If a floobergloob requires you to add a floobergloob.config.js to your project root that's a very good indicator floobergloob is not worth your time.

The only boilerplate files you need in a JS repo root are gitignore, package.json, package-lock.json and optionally tsconfig if you're using TS.

A node.js project shouldn't require a build step, and most websites can get away with a single build.js that calls your bundler (esbuild) and copies some static files dist/.

by nananana9

2/25/2026 at 9:47:49 AM

Simplify your tools and build processes to as few as possible, and pick tools with fewer (or no) config files.

It could depend on what you're doing, but if it's not for work the config hell is probably optional.

by ehnto

2/25/2026 at 9:44:32 AM

You start with the cleanest most minimal config you can get away with, but over the years you keep adding small additions and tweaks until it becomes a massive behemoth that only you will ever understand the reasoning behind.

by syhol

2/25/2026 at 9:55:43 AM

Right, and then when you don't work on it for 6 or 12 months, you come back and find that now you don't understand it either.

by iainmerrick

2/25/2026 at 10:09:17 AM

Part of doing it well is adding comments as you add options. When I used vim, every line or block in the config had an accompanying comment explaining what it did, except if the config’s name was so obvious that a comment would just repeat it.

by latexr

2/25/2026 at 10:32:45 AM

That's a good call. It's a big problem for JSON configs given pure JSON's strict no-comments policy. I like tools that let you use .js or better yet .ts files for config.

by iainmerrick

2/25/2026 at 1:50:18 PM

Or consider jsonc - json with comments - or jwcc - which is json with comments and trailing commas to make life a little easier.

https://jsonc.org/

https://nigeltao.github.io/blog/2021/json-with-commas-commen...

There are a lot of implementations of all of these, such as https://github.com/tailscale/hujson

by dgacmu

2/25/2026 at 2:24:49 PM

I like this idea a lot, and pushed for json5 at a previous job, but I think there are a few snags:

- it's weird and unfamiliar, most people prefer plain JSON

- there are too many competing standards to choose from

- most existing tools just use plain JSON (sometimes with support for non-standard features, like tsconfig allowing trailing commas, but usually poorly documented and unreliable)

Much easier just to make the leap to .ts files, which are ergonomically better in almost every way anyway.

by iainmerrick

2/25/2026 at 11:33:35 AM

A lot of json parsers will permit comments even though it isn't meant to be valid. Worth trying it, see if a comment breaks the config, and if not then use comments and don't worry about it.

by mikkupikku

2/25/2026 at 3:13:47 PM

For reference, jq and python don't allow comments.

by maleldil

2/25/2026 at 10:21:13 AM

Also an acceptable solution - create a "runner" subagent on a cheap model, that's tasked with running a command and relaying the important parts to the main agent.

by exitb

2/25/2026 at 10:30:22 AM

Yes, this is the solution. An agent that can clean up the output of irrelevant stuff

by mromanuk

2/25/2026 at 9:50:42 AM

Rather than an LLM=true, this is better handled with standardizing quiet/verbose settings, as this is a question of verbosity, where an LLM is one instance where you usually want it to be quieter, but not always.

Secondly, a helper to capture output and cache it, and frankly a tool or just options to the regular shell/bash tools to cache output and allow filtered retrieval of the cached output, as more so than context and tokens the frustration I have with the patterns shown is that often the agent will re-execute time-consuming tasks to retrieve a different set of lines from the output.

A lot of the time it might even be best to run the tool with verbose output, but it'd be nice if tools had a more uniform way of giving output that was easier to systematically filter to essentials on first run (while caching the rest).

by vidarh

2/25/2026 at 10:02:38 AM

Yes! After seeing a lot of discussions like this, I came up with a rule of thumb:

Any special accommodations you make for LLMs are either a) also good for humans, or b) more trouble than they're worth.

It would be nice for both LLMs and humans to have a tool that hides verbose tool output, but still lets you go back and inspect it if there's a problem. Although in practice as a human I just minimise the terminal and ignore the spam until it finishes. Maybe LLMs just need their own equivalent of that, rather than always being hooked up directly to the stdout firehose.

by iainmerrick

2/25/2026 at 12:02:00 PM

Yes, what's preventing the LLM from running myCommand > /tmp/out_someHash.txt ; tail out_someHash.txt and then greping or tailing around /tmp/out_someHash.txt on failure?

by MITSardine

2/25/2026 at 12:11:00 PM

There isn't really anything other than training, but they generally don't. You probably can get them to do that with some extra instructions, but part of the problem - at least with Claude - is that it's really trigger-happy about re-running the commands if it doesn't get the results it likes, assuming the results reflects stale results. Even with very expensive (in time) scripts I often see it start a run, pipe it to a file, put it in the background, then loop on sleep statements, occasionally get "frustrated" and check, only to throw the results away 30 seconds after they are done because it's made an unrelated change.

A lot of the time this behaviour is probably right. But it's annoyingly hard to steer it to handle this correctly. I've had it do this even with make targets where the makefile itself makes clear the dependencies means it could trust the cached (in a file) results if it just runs make <target>. Instead I regularly find it reading the Makefile and running the commands manually to work around the dependency management.

by vidarh

2/25/2026 at 11:57:44 AM

I think the concept has value, but I think targeting today's LLMs like this is short sighted.

It's making what is likely to be a permanent change to fix a temporary problem.

I think the thing that would have value in the long term is an option to be concise, accurate, and unambiguous.

This isn't something that should be considered to be only for LLMs. Sometimes humans want readability to understand something quickly adding context helps a great deal here, but sometimes accuracy and unambiguity are paramount (like when doing an audit) if dealing with a batch of similar things, the same repeated context adds nothing and limits how much you can see at once.

So there can be a benefit when a human can request output like this for them to read directly. On top of this is the broad range of of output processing tools that we have (some people still awk).

So yes, this is needed, but LLMs will probably not need this in a few years. The other uses will remain

by Lerc

2/25/2026 at 11:21:32 AM

It feels wild to have to keep reminding people, but AI changes very little. Tools have always had a variety of output, and ways to control this, and bad tools output a lot by default, whilst good tools hide it behind version of "-v" or easy greps. Don't add a --LLM or whatever, do add cleaner and consistent verbosity controls.

by moritonal

2/25/2026 at 1:27:20 PM

The UNIX philosophy of tools that handle text streams, staying "quiet" unless something goes wrong, doing one thing well, etc. are all still so well suited to the modern age of AI coding agents.

by caerwy

2/25/2026 at 3:23:07 PM

Why can't the agent harness dynamically decide whether outputs should be put into the context or not? It could check with an LLM to determine if the verbatim output seems important, and if not, store the full output locally but replace it in the prompt with a brief summary and unique ID. Then make a tool available so the full output can be retrieved later if necessary. That's roughly how humans do it, you scroll through your terminal and make quick decisions about what parts you can ignore, and then maybe come back later when you realize "oh I should probably read that whole stack trace".

It wouldn't even need to send the full output to make a decision, it could just send "npm run build output 500 lines and succeeded, do we need to read the output?" and based on the rest of the conversation the LLM can respond yes or no.

by burkaman

2/25/2026 at 3:28:28 PM

Isn't that what subagents do to a certain degree?

by zwarag

2/25/2026 at 3:32:37 PM

Sort of, but you also want to keep the sub-agent context small for as long as possible, and if you're paying per token there's no reason to be sending thousands of tokens that are probably useless.

by burkaman

2/25/2026 at 10:20:31 AM

We’ve got a long way to go in optimising our environments for these models. Our perception of a terminal is much closer to feeding a video into Gemini than reading a textbook of logs. But we don’t make that ax affordance at the moment.

I wrote a small game for my dev team to experience what it’s like interacting through these painful interfaces over the summer www.youareanagent.app

Jump to the agentic coding level or the mcp level to experience true frustration (call it empathy). I also wrote up a lot more thinking here www.robkopel.me/field-notes/ax-agent-experience/

by robkop

2/25/2026 at 11:36:02 AM

Beautiful simulation.

by hirako2000

2/25/2026 at 10:00:21 AM

So frequently beginners in linux command lines complain about the irregularity or redundance in command line tool conventions (sometimes actual command parameters -h --help or /h ? other times: man vs info; etc...)

When the first transformers that did more than poetry or rough translation appeared everybody noticed their flaws, but I observed that a dumb enough (or smart enough to be dangerous?) LLM could be useful in regularizing parameter conventions. I would ask an LLM how to do this or that, and it would "helpfully" generate non-functional command invocations that otherwise appeared very 'conformant' to the point that sometimes my opinion was that -even though the invocation was wrong given the current calling convention for a specific tool- it would actually improve the tool if it accepted that human-machine ABI or calling convention.

Now let us take the example of man vs info, I am not proposing to let AI decide we should all settle on man; nor do I propose to let AI decide we should all use info instead, but with AI we could have the documentation made whole in the missing half, and then it's up to the user if they prefer man or info to fetch the documentation of that tool.

Similarily for calling conventions, we could ask LLM's to assemble parameter styles and analyze command calling conventions / parameters and then find one or more canonical ways to communicate this, perhaps consulting an environment variable to figure out what calling convention the user declares to use.

by DoctorOetker

2/25/2026 at 11:20:54 AM

Similarly law professor Rob Anderson joked on X that llm hallucinated cases are good law:

https://x.com/ProfRobAnderson/status/2019078989348774129

> Indeed hallucinated cases are "better law." Drawing on Ronald Dworkin's theory of law as integrity, which posits that ideal legal decisions must "fit" existing precedents while advancing principled justice, this article argues that these hallucinations represent emergent normative ideals. AI models, trained on vast corpora of real case law, synthesize patterns to produce rulings that optimally align with underlying legal principles, filling gaps in the doctrinal landscape. Rather than errors, they embody the "cases that should exist," reflecting a Hercules-like judge's holistic interpretation.

by ralfd

2/25/2026 at 11:26:25 AM

Seems naive. You can get an LLM to agree with almost anything if you say the right things to it, and it will hallucinate citations to back you up without skipping a beat. You can probably get it to hallucinate case law to legalize murder on Mondays.

by mikkupikku

2/25/2026 at 10:35:55 PM

You’re talking about manipulated/malicious/intentfully steered hallucination but the parent is referring to trained emergent hallucination (even if sycophantic). These are two different things and both can occur, but the latter is what’s being tongue-in-cheek referred to by the professor.

by herewego

2/25/2026 at 11:49:16 AM

[dead]

by cornyhorse

2/25/2026 at 11:24:52 AM

It has long been a pet peeve of mine that the *nix world has no standard reliable convention for how to interrogate a program for it's available flags. Instead there are at least a dozen ways it can be done and you can't rely on any one of them.

by mikkupikku

2/25/2026 at 11:30:23 AM

That's not specific to unix though.

by lelanthran

2/25/2026 at 11:36:43 AM

I didn't say it was, but I simply don't care what problems any other kind of system has because they aren't my problems. Last time I had windows on any of my computers it was windows 98.

by mikkupikku

2/25/2026 at 1:29:34 PM

I've been thinking about using an OpenAPI schema to describe cli tools. It would probably need extensions to describe stdin/stout and other things that don't happen in REST.

by speed_spread

2/25/2026 at 1:32:41 PM

Have you seen Stainless's CLI generator?

https://www.stainless.com/blog/stainless-cli-generator-your-...

by tomeraberbach

2/25/2026 at 10:19:39 AM

Ah yes, the vaunted ffmpeg-llm --"take these jpegs and turn them into an mp4 and use music.mp3 as the soundtrack" command.

by fragmede

2/25/2026 at 10:30:24 AM

Ngl.. I can see the merit and simultaneously recoil in horror as I am starting to understand what linux greybeards hate about windofication of linux ( and now proposed llm-ification of it :D).

by iugtmkbdfil834

2/25/2026 at 2:57:47 PM

> The environment wins (less tokens burned = less energy consumed)

This is understandable logic, but at a systemic level it's not how things always go. Increasing efficiency can lead to increased consumption overall. You might save 50% in energy for your workload, but maybe now you can run it 3 times as much, or maybe 3 times more people will use it, because it's cheaper. The result might be a 50% INCREASE in energy consumed.

https://en.wikipedia.org/wiki/Jevons_paradox

by rel_ic

2/25/2026 at 3:40:37 PM

This is the standing reason that is always given for why we must all sit in freeway traffic clogs, and I think it's B.S., because it assumes that there are viable alternatives available in near-medium term, but that isn't always the case. The alternative to freeways that are supposed to compensate is a joint combination of denser housing and mass transit, which in California, is not happening at all...zoning laws and the slow pace of building mass transit due to regulation slow-down and the need to service urban sprawl, prevent that solution from relieving traffic pressure. Don't speak of busses, because taking two hours to get to work is not better than one hour. So..the freeways stay the same number of lanes and my commute time continues to grow, and I am tired of hearing it is for the best.

So yes, lower LLM costs would probably lead even more LLM usage and greater energy expenditures, but then again, so does having a moving economy, and all that comes with that.

by SubiculumCode

2/25/2026 at 3:14:59 PM

Yeah, probably. I wonder where speed-running fixing all the low-hanging fruit for AI-related efficiency improvements will leave us? It still seems worth doing. Maybe combined with a carbon tax.

by skybrian

2/25/2026 at 11:30:12 AM

Surprisingly often people refuse to document their architecture or workflow for new hires. However, when it's for an LLM some of these same people are suddenly willing to spend a lot of time and effort detailing architecture, process, workflows.

I've seen projects with an empty README and a very extensive CLAUDE.md (or equivalent).

by rustybolt

2/25/2026 at 12:13:15 PM

That could be because Claude offers a dedicated /init command to generate a CLAUDE.md if it doesn't exist.

by bool3max

2/25/2026 at 2:00:53 PM

I like the gist of this, however LLM may not be the best name for this: what if a new tech (e.g., SLM) takes over? AGENT may be a more faithful name until something better is standardized.

by googlielmo

2/25/2026 at 11:54:30 AM

Looks like the blog could use a HN=True. Hope the author won't get banned...

> Error: API rate limit exceeded for app ID 7cc6c241b6e6762bf384. If you reach out to GitHub Support for help, please include the request ID E9FC:7BEBA:6CDB3B4:6485458:699EE247 and timestamp 2026-02-25 11:51:35 UTC. For more on scraping GitHub and how it may affect your rights, please review our Terms of Service (https://docs.github.com/en/site-policy/github-terms/github-t...).

by hrpnk

2/25/2026 at 12:08:14 PM

Author here. Thanks for flagging. Let me look into it

by avh3

2/25/2026 at 11:45:21 AM

I would use this as a human. That npm output is crazy. Maybe a better variable would be "CONCISE=1". For LLMs, there are a few easier solutions, like outputing in a file (and then tail)., or running a subagent

by sirk390

2/25/2026 at 9:42:23 AM

On a lot of linux distros there is the `moreutils` package, which contains a command called `chronic`. Originally intended to be used in crontabs, it executes a command and only outputs its output if it fails. I think this could find another use case here.

by troethe

2/25/2026 at 7:26:12 PM

This is great, I like this. Wrote a 'chronic-file' variant that just dumps everything into a tmpfile and outputs the filepath for the agent in case of error and otherwise nothing

by markus1189

2/25/2026 at 11:49:44 AM

Useful enough to justify registering on HN. Thank you!

by kubanczyk

2/25/2026 at 2:30:11 PM

Many unix tools already print less logging when used im a script, ie. non-interactively. (I don't know how they detect that.) For example, `ls` has formatting/coloring and `ls | cat` does not. This solution seems like it would fit the problem from the article?

by TobTobXX

2/25/2026 at 3:15:04 PM

> I don't know how they detect that.

The OS knows (it has to because it set up the pipeline), and the process can find out through a system call, exposed in C as `isatty`: https://www.man7.org/linux/man-pages/man3/isatty.3.html

> This solution seems like it would fit the problem from the article?

Might not be a great idea. The world is probably already full of build tools pipelines that expect to process the normal terminal output (maybe with colours stripped). Environment variables like `CI` are a thing for a reason.

by zahlman

2/25/2026 at 3:03:46 PM

There’s a function isatty that detect if a file descriptor (stdout is one) is associated with a terminal

https://man.openbsd.org/man3/ttyname.3

I believe most standard libraries has a version.

by skydhash

2/25/2026 at 3:14:50 PM

I was about to comment the same thing. Usually I don't call the function directly, but via the tty command in my shell scripts:

  if tty -s; then
    echo "Standard input is a TTY (interactive mode)."
  else
    echo "Standard input is not a TTY (e.g., piped or redirected)."
  fi

Now I wonder how _isatty_ itself detects whether a file descriptor is associated with a terminal!

by sdsd

2/25/2026 at 5:07:09 PM

In OpenBSD, with the fcntl system call

https://github.com/openbsd/src/blob/master/lib/libc/gen/isat...

https://man.openbsd.org/fcntl

https://github.com/openbsd/src/blob/master/sys/sys/fcntl.h

https://github.com/openbsd/src/blob/ba496e5267528b649ec87212...

by skydhash

2/25/2026 at 12:24:33 PM

For Claude the most pollution usually comes from Claude itself.

It's worth noting thet just by setting the right tone of voice, choosing the right words, and instructing it to be concise, surgical in what it says and writes, things change drastically - like night and day.

It then starts obeying, CRITICALs are barely needed anymore and the docs it produces are tidy and pretty.

by tacone

2/25/2026 at 9:35:57 AM

This is merely scratching the surface.

LLMs (Claude Code in particular) will explicitly create token intensive steps, plans and responses - "just to be sure" - "need to check" - "verify no leftovers", will do git diff even tho not asked for, create python scripts for simple tasks, etc. Absolutely no cache (except the memory which is meh) nor indexing whatsoever.

Pro plan for 20 bucks per month is essentially worthless and, because of this and we are entering new era - the era of $100+ monthly single subscription being something normal and natural.

by canto

2/25/2026 at 12:41:46 PM

I assume you're using Opus.

I'm on the Pro plan. If I run out of tokens, which has only happened 2 or 3 times in months of use, I just work on something else that Claude can't do, or ...write the code myself.

You do have to keep a close eye on it, but I would be doing that anyway given that if it goes haywire it's wasting my time as well as tokens. I'd rather spend an extra minute writing a clearer prompt telling it exactly what I want it to do, than waste time on a slot machine.

by zarzavat

2/25/2026 at 9:39:23 AM

>LLMs (Claude Code in particular) will explicitly create token intensive steps, plans and responses - "just to be sure" - "need to check" - "verify no leftovers", will do git diff even tho not asked for, create python scripts for simple tasks, etc. Absolutely no cache (except the memory which is meh) nor indexing whatsoever.

Most of these things can be avoided with a customized CLAUDE.md.

by outime

2/25/2026 at 9:48:23 AM

Not in my projects it seems. Perhaps you can share your best practices? Moreover, avoiding these should be the default behaviour. Currently the default is to drain your pockets.

P.S CLAUDE.md is sometimes useful but, it's a yet another token drain. Especially that it can grow exponentially.

by canto

2/25/2026 at 10:00:57 AM

I agree it should be the default, but it isn't. I also tend to think it's mainly to make money by default.

Another thing that helps is using plan mode first, since you can more or less see how it's going to proceed and steer it beforehand.

by outime

2/25/2026 at 9:36:51 AM

codex seems chill

by slopinthebag

2/25/2026 at 9:47:53 AM

And yet when my $100 CC Pro renewed last month my instinctive thought was wow is that all?

by jen729w

2/25/2026 at 10:30:15 AM

That can be read in two ways:

1) It's only $100, well worth the money.

2) Surprisingly little value was provide for all that money.

by mrweasel

2/25/2026 at 10:03:41 AM

I think about what I do in these verbose situations; I learn to ignore most of the output and only take forward the important piece. That may be a success message or error. I've removed most of the output from my context window / memory.

I see some good research being done on how to allow LLMs to manage their own context. Most importantly, to remove things from their context but still allow subsequent search/retrieval.

by m0rde

2/25/2026 at 9:59:58 AM

Huh. I've noticed CC running build or test steps piped into greps, to cull useless chatter. It did this all by itself, without my explicit instructions.

Also, I just restart when the context window starts filling up. Small focused changes work better anyway IMO than single god-prompts that try do do everything but eventually exceed context and capability...

by isoprophlex

2/25/2026 at 11:39:57 AM

cc is the C compiler.

Please don't overload that term with trendy LLM products. You can use the full name.

by tovej

2/25/2026 at 1:50:57 PM

Surely any distinguished connoisseur of terminology gatekeeping such as yourself is able to distinguish between 'cc' and 'CC'. My terminal is able to spot the difference, you should be able to as well.

by isoprophlex

2/25/2026 at 2:28:50 PM

CC is an environment variable / internal variable used by most build tools to identify the current C compiler. cc is the standardized name of the executable, now usually an alias for gcc.

Both CC and cc refer to the C compiler, in slightly different ways.

by tovej

2/25/2026 at 10:23:53 AM

[dead]

by huflungdung

2/25/2026 at 10:43:11 PM

the scroll breaks upon zoom in, a bit nauseating

by irawen

2/25/2026 at 1:22:44 PM

This is basically what RTK "Rust Token Killer" does.

Removes all the fluff around commands that agents use frequently.

https://github.com/rtk-ai/rtk

by bearjaws

2/25/2026 at 11:33:15 AM

Funny! I built an entire cli and ecosystem around this:

https://github.com/bodo-run/stop-nagging

by mohsen1

2/25/2026 at 9:32:30 AM

could we not instruct the LLM to run build commands in a sub agents which could then just return a summary of what happened?

this avoids having to update everything to support LLM=true and keep your current context window free of noise.

by vorticalbox

2/25/2026 at 9:53:01 AM

Make (or whatever) targets that direct output to file and returns a subset have helped me quite a bit. Then wrap that in an agent that also knows how and when to return cached and filtered data from the output vs. rerunning. Fewer tokens spent reading output details that usually won't matter, coupled with less context pollution in the main agent from figuring out what to do.

by vidarh

2/25/2026 at 9:44:23 AM

q() { local output output=$("$@" 2>&1) local ec=$? echo "$output" | tail -5 return $ec }

There :)

by canto

2/25/2026 at 9:35:47 AM

That would achieve 1 of the 3 wins.

by dizzy3gg

2/25/2026 at 10:05:56 AM

If you use a smaller model for the sub agent you get all three

Of course you can combine both approaches for even greater gains. But Claude Code and like five alternatives gaining an efficient tool-calling paradigm where console output is interpreted by Haiku instead of Opus seems like a much quicker win than adding an LLM env flag to every cli tool under the sun

by wongarsu

2/25/2026 at 9:38:31 AM

Probably the main one, people mostly complain about context window management rather than token usage

by noname120

2/25/2026 at 9:44:15 AM

Dunno about that. Having used the $20 claude plan, I ran out of tokens within 30 minutes if running 3-4 agents at the same time. Often times, all 3-4 will run a build command at the end to confirm that the changes are successful. Thus the loss of tokens quickly gets out of hand.

Edit: Just remembered that sometimes, I see claude running the build step in two terminals, side-by-side at nearly the same time :D

by Bishonen88

2/25/2026 at 11:10:51 AM

If the output of your build tool is too verbose for a mechanical brain to keep on top of, did the meat brain ever stand a chance?

Why was the output so verbose in the first place then?

by eptcyka

2/25/2026 at 11:39:46 AM

So you can debug it without having to do a second build with extra flags, and in order to have a sense of what the build is doing at any particular time.

by xyzsparetimexyz

2/25/2026 at 10:46:38 AM

Given that most of the utility of Typescript is to make VSCode play nice for its human operator, _should_ we be using Typescript for systems that are written by machines?

by fergie

2/25/2026 at 12:59:26 PM

All of this because we only have stdout and stderr and nothing in between. I wish there was a stdlog or stddebug or something

by yoz-y

2/25/2026 at 3:07:11 PM

yes, if only we had a more fine-grained log level hierarchy that we could get every piece of software to agree to...

by we_have_options

2/25/2026 at 12:40:52 PM

Seeing a JSON configuration file that stores environment variables makes me want to cry. Just to think that somewhere, somehow, it's going to launch an entire JavaScript VM (tens of megabytes) just to parse a file with 12 lines in it, then extract from a JavaScript the fields, munge it, eventually turn into more or less an array of VAR=val C strings which get passed to a forked shell....

by titzer

2/25/2026 at 12:56:14 PM

Why do you presume it needs a JavaScript VM to parse JSON?

by spankalee

2/25/2026 at 12:44:48 PM

Granted, I have no idea how Claude Code operates internally, but if it’s already running in a JS VM, can’t it read the file itself?

by sgarland

2/25/2026 at 11:50:07 AM

Most of what helps LLMs here is exactly what helps humans: less noise, clearer signals, predictable output.

by gormen

2/25/2026 at 10:40:43 AM

Can TOON format help in this, with "LLM=true" we can reduce the noise which pollutes context

by subhajeet2107

2/25/2026 at 9:25:54 AM

I never considered the volume of output tokens from dev tools, but yeah, I like this idea a lot.

by bigblind

2/25/2026 at 9:43:11 AM

I noticed this on my spring boot side project. Successful test runs produce thousands of log lines in default mode because I like to e.g. log every executed SQL statement during development. It gives me visibility into what my orm is actually doing (yeh yeh I know I should just write SQL myself). For me it's just a bit of scrolling and cmd+f if I need to find something specific but Claude actually struggles a lot with this massive output. Especially when it then tries to debug things finding the actual error message in the haystack of logs is suddenly very hard for the LLM. So I spent some time cleaning up my logs locally to improve the "agentic ergonomics" so to say.

In general I think good DevEx needs to be dialed to 11 for successful agentic coding. Clean software architecture and interfaces, good docs, etc. are all extremely valuable for LLMs because any bit of confusion, weird patterns or inconsistency can be learned by a human over time as a "quirk" of the code base. But for LLMs that don't have memory they are utterly confusing and will lead the agent down the wrong path eventually.

by jascha_eng

2/25/2026 at 2:39:03 PM

I've been using CODING_AGENT=true

by Gertig

2/25/2026 at 1:25:45 PM

We’ve reinvented exit codes…

by exabrial

2/25/2026 at 2:03:11 PM

This seems like a really solid idea: using an environment variable in command line tools and small apps to control output for AI vs. human digestion. Even given efficient attention mechanisms, slop tokens in the context window are bad.

I also like a discussion in this thread: using custom tools to reduce the frequency of tool calls in general, that is, write tool wrappers specific for your applications or agents.

by mark_l_watson

2/25/2026 at 9:41:00 AM

Interesting idea but bad tspec. A better approach would be a single env var (DEV_MODE perhaps) with “agent” and “human” as values (and maybe “ci”).

by block_dagger

2/25/2026 at 9:31:13 AM

great idea. thought about the waste of tokens dozens of times when I saw claude code increase the token count in the CLI after a build. I was wondering if there's a way to stop that, but not enough to actually look into it. I'd love for popular build tools to implement something along those lines!

by Bishonen88

2/25/2026 at 10:46:49 AM

I wonder whether attention-free architectures like Mamba or Gated DeltaNet are distracted less by irrelevant context, because they don't recall every detail inside their context window in the first place. Theoretically it should be fairly easy to test this via a dedicated "context rot benchmark" (standard benchmarks but with/without irrelevant context).

by cubefox

2/25/2026 at 9:52:44 AM

I think I noticed LLMs doing >/dev/null on routine operations.

by scotty79

2/25/2026 at 9:45:45 AM

`Humans=True`

The best friend isn't a dog, but the family that you build. Wife/Husband/kids. Those are going to be your best friends for life.

by pelasaco

2/25/2026 at 2:01:06 PM

MCP as an env-var ;)

by philipwhiuk

2/25/2026 at 1:20:05 PM

Actually what we have is an entire stack, starting with Von Neumann arch, the kernel, the browser, auth —- it is optimized for the intuition of neither humans nor agents. All the legacy cruft that we glibly told people to RTFM on is now choking your agent and burning your tokens.

I have a solution to all this of course but why should I tell anyone.

by user3939382

2/25/2026 at 10:28:27 AM

Can we just instruct the agents to redirect output streams to files, and then use grep to retrieve the necessary lines?

by haarlemist

2/25/2026 at 9:32:51 AM

Speaking of obvious questions. Why are you counting pennies instead of getting the LLM to do it? (Unless the idea was from an LLM and the executive decision was left to the operator, as well as posting the article)

So much content about furnishing the Markdown and the whatnot for your bots. But content is content?

by keybored

2/25/2026 at 10:42:17 AM

Or, stop outputting crap and use a logger. Make an LLM-only logger for output LLMs need and use stdout for HUMAN things.

by deafpolygon

2/25/2026 at 9:56:21 AM

[dead]

by MarcLore

2/25/2026 at 10:04:25 AM

[dead]

by octoclaw

2/25/2026 at 9:48:37 AM

This all seems like a lot of effort so that an agent can run `npm run build` for you.

I get the article's overall point, but if we're looking to optimise processing and reduce costs, then 'only using agents for things that benefit from using agents' seems like an immediate win.

You don't need an agent for simple, well-understood commands. Use them for things where the complexity/cost is worth it.

by Peritract

2/25/2026 at 9:59:45 AM

Feedback loops are important to agents. In the article, the agent runs this build command and notices an error. With that feedback loop, it can iterate a solution without requiring human intervention. But the fact that the build command pollutes the context in this case is a double-edge sword.

by m0rde

2/25/2026 at 11:45:27 AM

If you really need that, the easy solution here is to get a list of errors using an LSP (or any other way of getting a list of errors, even grep "Error:"), and only giving that list of errors to the LLM if the build fails. Otherwise just tell the LLM "build succeeded".

That's an extremely simple solution. I don't see the point in this LLM=true bullshit.

by tovej

2/25/2026 at 9:50:44 AM

But those simple and well understood commands can be part of a huge workflow the LLM embarks on.

by teekert

2/25/2026 at 4:50:21 PM

Again, if your priority is to minimise costs, then not forcing every part of that huge workflow through the agent is a good start.

by Peritract