alt.hn

6/9/2026 at 1:27:03 PM

Is Grep All You Need? How Agent Harnesses Reshape Agentic Search

https://arxiv.org/abs/2605.15184

by Anon84

6/9/2026 at 7:04:34 PM

In my research grep is fine if you don’t care about tokens and you have less than 100k files. The direct corpus interaction paper [1] shows a breakdown past this level. In my personal experience you get a bit better relevance than a BM25 search engine with grep plus an agent. But it requires you to eat tokens.

If you think grep is great, it’s because you’ve been social engineered to organize your content to be findable. We document why something is useful to an agent. We put it in a logical place.

Just organizing content is at least half of building search, agentic or not. It’s one reason Google is successful, we’re all trying to make our content findable by the search engine. It’s not all technology :)

1- https://arxiv.org/abs/2605.05242

by softwaredoug

6/9/2026 at 7:43:02 PM

> If you think grep is great, it’s because you’ve been social engineered to organize your content to be findable. ...

This is such a strange train of thought. How do did you get there?

by cpburns2009

6/9/2026 at 8:11:31 PM

I'm not literally saying you were social engineered. I'm saying all the incentives are there for you to organize your content.

Incentives to make things findable is more important to search than any technology.

by softwaredoug

6/9/2026 at 9:25:33 PM

I read that as, "you've learned to insert weird entropy meta-breadcrumbs just for finding"

so if i just index and search then i can stop writing like that?

by nh23423fefe

6/9/2026 at 10:38:10 PM

Long before AI I remember asking people in code review to add comments specifically to make the code grep-able. Same for for privileging key value mapping to dynamic string concatenation.

by allan_s

6/9/2026 at 11:10:15 PM

The social engineering thing runs deep. For example, if you grep for “Key” method, chances are the type/class name would stand on the same line. This is the case in Go and, I think, in many other programming languages (ironically, not C).

Lines are a fundamental building block of text and it’s not unreasonable to optimize them.

)

by piekvorst

6/9/2026 at 7:09:39 PM

It seems ridiculous that, for example, Copilot running in Visual Studio working on a C# codebase finds stuff in code by grepping around instead of using the Roslyn-driven code symbol and semantic database built into Visual Studio. I'm guessing it's because the people they get to work on AI stuff are AI People who probably only write in Python

by contextfree

6/9/2026 at 7:26:11 PM

It is sort of funny when Copilot hasn’t been integrated with Microsoft’s stuff. But it does make some sense from a business point of view. Make it work with grep, it works everywhere.

by bee_rider

6/9/2026 at 9:45:34 PM

Microsoft was mostly on the Embrace step. They've reached the Extend step with Copilot. They'll eventually Extinguish grep.

It's best not to use Microsoft products.

by inetknght

6/9/2026 at 10:25:28 PM

What does this even mean, how do you extinguish grep

by contextfree

6/9/2026 at 9:21:38 PM

There's a lot more examples of grep usage than Visual code search in the training set.

by SkitterKherpi

6/10/2026 at 4:26:47 AM

Codex does this in VSCode as well.

Compilations break all the time and those symbols either become useless or it’s just quicker to use grep.

by pipeline_peak

6/9/2026 at 5:00:51 PM

Don’t presume this study has anything to do with programming. They measured an agent’s ability to search long conversations, not code.

> We evaluate on a 116-question representative subset of the LongMemEval benchmark (Wu et al., 2025), which tests an agent’s ability to answer questions over long conversations spanning multiple sessions.

by quinncom

6/9/2026 at 6:37:21 PM

I get a sense that I was click-baited by article's title with the classic trope of "X is all you need". This research is a solid contribution, but is far from all we need to understand grep vs semantic search in agent retrieval.

by schipperai

6/9/2026 at 4:51:29 PM

I have always used traditional grep to search codebases. It serves me better than an IDE when there’re lots of scattered and frequent queries.

grep’s design is surprisingly winning, exceeding expectations to this day.

by piekvorst

6/9/2026 at 5:48:49 PM

you might be interested in https://github.com/boyter/cs

pretty fast and neat project to search code interactively with a lot of optimizations on finding the right thing

by weaksauce

6/10/2026 at 12:34:15 AM

Came here to post than and you already did. Thank you!

by boyter

6/9/2026 at 3:45:01 PM

Tangential, I have a hook that rewriters grep to rg but lately I wonder if this is actually wasteful as the model is so biased to grep, is there a way to shim/alias perhaps?

by hmokiguess

6/9/2026 at 4:04:30 PM

My CLI does something close to this:

https://github.com/gitsense/gsc-cli

`gsc grep` is just an alias for `gsc rg`, mostly because agents are much more likely to reach for “grep” than “rg”.

It works pretty well, but it is not a perfect drop-in replacement. `grep` and `ripgrep` differ in a few details, especially around glob/wildcard behaviour and flags. What I found works is to not use `grep` in search examples, and have the CLI spit out an error message for the AI saying this is `ripgrep`, so it needs to use `ripgrep` syntax.

by sdesol

6/9/2026 at 3:57:51 PM

If performance is the concern, ugrep will get you most of the way there relative to gnu grep, and should be fully grep compatible in terms of syntax:

https://github.com/Genivia/ugrep#aliases

Claude Code may ship with ugrep already.

by celrod

6/9/2026 at 3:47:04 PM

Many harnesses are doing this already, "Grep" is the tool name, ripgrep is the implementation

It depends on if it is using Grep the harness tool or Grep from the bash tool

by verdverm

6/9/2026 at 3:53:48 PM

I see it using the Bash tool infrequently though sometimes Grep. I'm on Claude Code for now due to subscription lock-in, been contemplating moving to pi though

by hmokiguess

6/9/2026 at 4:00:18 PM

My experience here (also Claude user) is that the model uses different tools in different contexts. I see rg more on frontend and grep more on backend work. I imagine it defaults to using the tool it has more learning around within the contexts it's reaching for and since for the most part it's 6 of one or half a dozen of the other you'll see environment specific usages for these tools in claude for now. I imagine eventually it'll standardize but we're early yet on such things.

If you'd told me a decade ago I'd finally learn some sed in 26 because I'd want to understand what the AI was doing I'd have told you you were crazy . . .

by joelfried

6/9/2026 at 4:55:44 PM

Why do you have subscription lock-in? Even if you pay for a yearly subscription, Anthropic will refund you pro rata if you cancel early.

by Analemma_

6/9/2026 at 4:45:20 PM

I've been on a look out for any harness that properly secures a protocol to the LLM, but they're all just "here's some tools, hopefully you don't use bash for everything".

by cyanydeez

6/10/2026 at 3:04:41 AM

And they all do. I had to add special instructions to tell Claude Code to prefer its built-in read_file tool, rather than using `sed -n 180,210p` everywhere.

by fwip

6/9/2026 at 4:04:28 PM

This is a surprising result. With structured inputs like source code, I’d expect grep to outperform semantic search, but natural language’s errors and inconsistencies seem to leave so many cracks for information to fall through.

by gbacon

6/9/2026 at 4:15:39 PM

This paper is based on quality so I don't think it should be that surprising if you take loops into consideration. What the agent finds in the first pass, can help if formulate the next grep if needed.

by sdesol

6/9/2026 at 4:07:40 PM

If you are truly bitter-lesson pilled - give the agent all the tools and let it decide which to use.

- regex (grep) - hybrid search (bm25+vector)

this X vs Y is uninteresting when the answer can be both.

by jeffchuber

6/9/2026 at 4:15:38 PM

That assumes that the agent knows which one is better. And to bake in which one is better via post-training would require a study like this to establish where each one works well

by pastel8739

6/9/2026 at 4:46:45 PM

I’ve got a custom ultra high performance streaming semantic search I exposed as a tool and the RL bias in Claude is almost insurmountable without copious and consistent steering. Codex will follow instructions and use the tools I ask it to but for gods sake between Claude asking to take a nap because it’s getting late in the session and it regressing to RL biased tools like grep it’s maddening. When I can get it to use my compositional tools tool calls drop from like 20-50 to 3-4, but it’s almost impossible to steer.

by fnordpiglet

6/9/2026 at 10:39:47 PM

Anthropic is, I believe, fully pursuing the idea that you shouldn't use their model with anything but their own products. They don't care whether it generalizes.

I agree it's very frustrating to use with custom tools/harnesses that can speed up the process for domain specific purposes.

by mediaman

6/9/2026 at 4:36:44 PM

Exactly this, and this tool called qmd is what I use for the hybrid search portion. It also uses local LLMs to provide summaries on your own markdown data too. My agents use both depending on what type of search they are doing, and both provide good results.

https://github.com/tobi/qmd

by bachittle

6/9/2026 at 6:09:15 PM

Both is usually the right answer, since you can use LLMs to do query expansion and effectively increase the recall performance of your retrieval algo

by budududuroiu

6/9/2026 at 4:30:45 PM

it will only use tools it was trained on? what's the benfit of givig it all the tools.

by dominotw

6/10/2026 at 1:52:03 AM

then you are not agi pilled

by jeffchuber

6/9/2026 at 4:11:56 PM

I'm still disappointed that ai can't use ctags, its used for finding strings and patterns, its right there.

by worthless-trash

6/9/2026 at 4:19:17 PM

> I'm still disappointed that ai can't use ctags,

What do you mean by this? Do you mean not automatically build the index?

by sdesol

6/9/2026 at 4:51:35 PM

it inspects a project, finds the ctags files, then goes on to use grep.

by worthless-trash

6/9/2026 at 4:54:25 PM

[flagged]

by worthless-trash

6/10/2026 at 2:34:03 AM

> grep generally yields higher accuracy

And a lot more tokens, and slower speed. Yes you can get more accuracy if you suck tons more data into context.

But compare this to more advanced code agent methods like Tree Sitter, PageRank, LSP, that build semantic maps to provide more relevant context. Grep alone can't do that

by 0xbadcafebee

6/9/2026 at 3:31:46 PM

I recently watched the new Palantir + Kirkland & Ellis fund formation platform demo, and I was surprised to see how effective the union of structured data was in an agent harness. We're used to dealing with flat files and comparing here basic ways of searching, essentially, long strings, but using Palantir's "Ontology" graph framework, I think Kirkland is going to be able to achieve some exception and differentiating outcomes in legal tech. The whole idea assumes that they've got great structured data already, and perhaps that's the real valuable unknown, but giving an agent those tools is super powerful.

I wrote about it[1] and came away with a different view on both Palantir and the future of agentic workflows personally.

[1] sorry, LinkedIn: https://www.linkedin.com/pulse/fund-managements-killer-app-d...

by piker

6/9/2026 at 6:32:46 PM

That was great, thanks for the write-up. It’s rare to get a peek into Palantir’s ontology-forward approach. I’ve certainly been curious.

> But it would make no sense to have an LLM regurgitate an existing form document token-by-token rather than call a piece of 1994 software like Hotdocs to populate some placeholders.

This is a real “oof”, isn’t it. Very difficult to understand what they were going for here. Perhaps they just assumed no one in the intended audience would pick it up. But it certainly is enough of a red flag that it made me go back to the top of your write-up for a re-read, thinking about their whole pipeline in much more sceptical terms.

by darkteflon

6/9/2026 at 7:10:32 PM

Really glad you liked it! Yes, I suspect it was just for show since a plain document popping onto the screen would just be jarring.

Edit: looks like you’re in London, too. Hit me up and let’s connect. My details are in the bio!

by piker

6/10/2026 at 5:53:57 AM

Yeah absolutely, will do! Looks like we have common interests. I’m in Tokyo though, actually. Rest of the team is in London.

by darkteflon

6/9/2026 at 7:13:37 PM

Table 2 and 3 tell you basically all you need to know. When you use a harness that is tuned towards programing (Codex and Claude Code), grep wins. When you use a neutral harness, vector search wins.

So far every Grep vs RAG discussion I've seen conflates overlapping factors. The most common is simply that a company rebuilt their pipeline from scratch and fixed a bunch of problems. The worst is when they go from one-shot RAG to multi-step Grep and completely miss the fact that multi-step RAG would likely get them similar results.

At the end of the day, the most important thing is knowing the _product features_ your users care about and making sure that's represented in the pipeline.

by SkyPuncher

6/9/2026 at 7:20:40 PM

As far as i know Claude Code also uses LSP and tree-sitter to find things in your source code.

by ako

6/9/2026 at 8:04:53 PM

If you install LSP. AFAIR their first versions used some kind of treee/structure for easy search, but they found out that grep was better/similar but with less complications (they now ship some kind of grep).

by krzyk

6/9/2026 at 5:26:10 PM

This paper oversells on the title. Like, what is chronos, which embedding model was used, which reranker, how was the reranking done, why is chronos much better than claude code

by stephantul

6/9/2026 at 5:11:04 PM

Is <blank> the only ML paper title?

by liminal

6/9/2026 at 11:21:24 PM

From the article:

> LongMemEval rewards recovering literal witnesses: exact dates, counts, preferences, and spans that often remain stable under tokenization.

Is this saying they chose a benchmark that is biased towards doing well against literal string matching, thus works well with grep, and then (gasp) showed that grep did well, finally declaring "grep is all you need"?

The examples in the benchmark's demo image(1) are all examples you could see grep doing well on. A conversation about bikes, then a query about bike(s) where "bike" is a common token hit. But not stuff like a conversation about a Beethoven sonata, then a question about classical music, where embedding based approach would shine.

(1) https://github.com/xiaowu0162/LongMemEval/blob/main/assets/l...

by yetanotherjosh

6/9/2026 at 3:23:16 PM

Feels important, but I wish they also had compared against something like MeiliSearch or Algolia.

by yodon

6/9/2026 at 3:48:34 PM

100%, there's even Typesense, open source Algolia, which can do hybrid search and a number of other fancy things

I'm currently working on a markdown kb / search tool for my agents, in part built on TS

by verdverm

6/9/2026 at 10:27:52 PM

If grep were enough, SQLite wouldn't exist.

by _pdp_

6/9/2026 at 4:13:11 PM

I'm curious to see what patterns it's grepping.

by kwillets

6/9/2026 at 3:18:16 PM

Surely 'strings' would be even better?

by sys_64738

6/9/2026 at 4:15:59 PM

This has been posted before, but a dead-simple pattern that helps enormously with steering the model to the right code area is a DESIGN.md that it creates, updates, and references periodically.

by greenavocado

6/9/2026 at 6:07:03 PM

What does it contain?

by nibbleyou

6/10/2026 at 4:29:13 AM

[flagged]

by ashishdhiman23

6/9/2026 at 7:01:47 PM

[flagged]

by KaiShips

6/9/2026 at 5:42:52 PM

[flagged]

by tailor_gunjan93

6/9/2026 at 3:54:52 PM

[flagged]

by sdesol

6/9/2026 at 6:04:15 PM

[flagged]

by gauravvij137

6/9/2026 at 3:53:02 PM

[dead]

by wseadowntown