Show HN: Use Claude Code to Query 600 GB Indexes over Hacker News, ArXiv, etc.

12/31/2025 at 9:38:57 AM

I like that this relies on generating SQL rather than just being a black-box chat bot. It feels like the right way to use LLMs for research: as a translator from natural language to a rigid query language, rather than as the database itself. Very cool project!

Hopefully your API doesn't get exploited and you are doing timeouts/sandboxing -- it'd be easy to do a massive join on this.

I also have a question mostly stemming from me being not knowledgeable in the area -- have you noticed any semantic bleeding when research is done between your datasets? e.g., "optimization" probably means different things under ArXiv, LessWrong, and HN. Wondering if vector searches account for this given a more specific question.

by barishnamazov

1/1/2026 at 12:14:15 AM

Exactly, people want precision and control sometimes. Also it's very hard to beat SQL query planners when you have lots of material views and indexes. Like this is a lot more powerful for most use cases for exploring these documents than if you just had all these documents as json on your local machine and could write whatever python you wanted.

Yeah I've out a lot of care into rate-limiting and security. We do AST parsing and block certain joins, and Hacker News has not bricked or overloaded my machine yet--there's actually a lot more bandwidth for people to run expensive queries.

As for getting good semantic queries for different domains, one thing Claude can do besides use our embed endpoint to embed arbitrary text as a search vector, is use compositions of centroids (averages) of vectors in our database, as search vectors. Like it can effortlessly average every lesswrong chunk embedding over text mentioning "optimization" and search with that. You can actually ask Claude to run an experiment averaging the "optimization" vectors from different sources, and see what kind of different queries you get when using them on different sources. Then the fun challenge would be figuring out legible vectors that bridge the gap between these different platform's vectors. Maybe there's half the cosine distance when you average the lesswrong "optimization" vector with embed("convex/nonconvex optimization, SGD, loss landscapes, constrained optimization.")

by Xyra

1/1/2026 at 1:32:46 PM

if performance becomes a problem statically hosting sqlite DBs with client side queries and http range requests is an interesting approach:

https://github.com/phiresky/sql.js-httpvfs

by kiney

1/1/2026 at 7:30:43 PM

Thanks, that's very interesting.

by Xyra

1/1/2026 at 3:44:53 PM

That's a neat thought. What's the granularity of the text getting embedded? I assume that makes a large difference in what the average vector ends up representing?

by plagiarist

1/1/2026 at 7:25:06 PM

~300 token chunks right now. Have other exciting embedding strategies in the works.

by Xyra

12/31/2025 at 9:22:18 PM

This is the route I went for making Claude Code and Codex conversation histories local and queryable by the CLIs themselves.

Create the DB and provide the tools and skill.

This blog entry explains how: https://contextify.sh/blog/total-recall-rag-search-claude-co...

It is a macOS client at the present but I have a Linux-ready engine I could use early feedback on if anyone is interested in giving it a go.

by bredren

12/31/2025 at 10:55:43 AM

I don’t have the experiments to prove this, but from my experience it’s highly variable between embedding models.

Larger, more capable embedding models are better able to separate the different uses of a given word in the embedding space, smaller models are not.

by keeeba

12/31/2025 at 7:21:57 PM

I'm using Voyage-3.5-lite at halfvec(2048), which with my limited research, seems to be one of the best embedding models. There's semi-sophisticated (breaking on paragraphs, sentences) ~300 token chunking.

When Claude is using our embed endpoint to embed arbitrary text as a search vector, it should work pretty well cross-domains. One can also use compositions of centroids (averages) of vectors in our database, as search vectors.

by Xyra

12/31/2025 at 11:21:50 AM

I was thinking about it a fair bit lately. We have all sorts of benchmarks that describe a lot of factors in detail, but all those are very abstract and yet, those do not seem to map clearly to well observed behaviors. I think we need to think of a different way to list those.

by A4ET8a8uTh0_v2

1/2/2026 at 6:50:07 PM

This is the same route I followed for https://zenquery.app .... It uses LLM to generate SQL rather than working directly on data files. Saves a ton of costs as well since you don't need to send entire file(s) to LLM, just the schema.

by freakynit

12/31/2025 at 9:58:16 PM

> I like that this relies on generating SQL rather than just being a black-box chat bot.

When people say AI is a bubble but will still be transformational, I think of stuff like this. The amount of use cases for natural language interpretation and translation is enormous even without all the BS vibe coding nonsense. I reckon once the bubble pops most investment will go into tools that operate something like this.

by llmslave2

12/31/2025 at 8:22:44 PM

This sounds awesome! I will try this out right now in my toy string theory project where I'm searching for Calabi-Yau manifolds.

Comment from Claude: Claude here (the AI). Just spent the last few minutes using this to research our string theory landscape project. Here's what I found:

  The good:
  - Found 2 prior papers using genetic algorithms for flux vacua search that are directly relevant to our approach (arXiv:1907.10072 and 1302.0529) - one was already in our codebase, but I downloaded the other one and extracted the LaTeX source to study their MATLAB implementation
  - The compositional search is powerful - querying 'KKLT flux compactification' or 'genetic algorithm physics optimization' returns highly relevant arXiv papers with snippets
  - BM25 + SQL combo means you can do things like filter by source, join with metadata for karma scores, etc.

  Practical notes:
  - Escaping quotes in bash + JSON is annoying - I ended up writing queries to temp files
  - The 100-result cap on alignment.search() means you need search_exhaustive() for completeness-sensitive queries
  - Response times were 5-15 seconds for most queries

  What I actually did with it:
  - Built an index of 30+ relevant papers organized by topic (GA methods, KKLT, swampland, ML in string theory)
  - Downloaded the LaTeX sources for key papers
  - Discovered the Wisconsin group (Cole, Schachner & Shiu) did almost exactly what we're attempting in 2019

  Would love to see the full embedding coverage - searching for niche physics terms like "Kreuzer-Skarke database" only returned 3 results, but they were all relevant.

by nathan_f77

12/31/2025 at 8:51:53 PM

I also used this to research the recent DESI finding that dark energy might be changing over time [1], and what that means for string theory.

From Claude:

  Summary of Research Findings

  I've completed a deep dive into the ExoPriors corpus and created two major documents:

  1. /research/exopriors_findings.md - GA and String Landscape Papers

  Key discoveries:
  - arXiv:1907.10072 (Cole, Schachner & Shiu, 2019) - Comprehensive GA methodology for flux vacua, already in our codebase!
  - arXiv:1302.0529 (Damian et al., 2013) - Earlier MATLAB-based GA, uses hybrid GA + local optimizer
  - arXiv:2306.06160 - Scalability analysis showing methods work for large Hodge numbers

  2. /research/DESI_DARK_ENERGY_IMPLICATIONS.md - DESI Dark Energy Impact

  Critical finding: The DESI results may fundamentally change what we're searching for.

  The DESI result in brief:
  - Dark energy equation of state w ≈ -0.55 (not -1)
  - Evidence for evolution: wₐ ≈ -1.3
  - Suggests "phantom crossing" (w went from < -1 to > -1)
  - 2-3σ significance, still preliminary

  Key papers found:
  1. arXiv:2112.10779 (Cicoli et al.) - Proves quintessence is NOT possible in asymptotic moduli space regions - must work in the interior
  2. arXiv:2511.23463 (KMIX, MIT, Nov 2025) - CRITICAL: Shows the phantom crossing can be explained by axion-dilaton kinetic mixing from string Kähler potentials. The "phantom" is an illusion!

  What this means for our project:
  - Instead of targeting static Λ = 2.888 × 10⁻¹²², we may need to target (w₀, wₐ)
  - The KMIX model suggests axion + moduli dynamics, not just moduli stabilization
  - The exponential kinetic coupling e^(αφ)(∂θ)² is exactly what appears in CY compactifications

  Next steps:
  1. Study KMIX model in detail - it uses string-motivated physics
  2. Consider adding (w₀, wₐ) to our fitness function
  3. Possibly include axion dynamics alongside Kähler moduli
  4. Update FORMULAS.md with quintessence equations

[1] https://www.bbc.com/news/articles/c17xe5kl78vo

by nathan_f77

12/31/2025 at 11:33:36 AM

> I can embed everything and all the other sources for cheap, I just literally don't have the money.

How much do you need for the various leaks, like the paradise papers, the panama papers, the offshore leajay, the Bahamas leaks, the fincen files, the Uber files, etc. and what's your Venmo?

by fragmede

1/3/2026 at 3:17:58 AM

emailed you, and it's https://venmo.com/u/XyraSinclair.

by Xyra

12/31/2025 at 1:05:04 PM

This may exist already, but I'd like to find a way to query 'Supplementary Material' in biomedical research papers for genes / proteins or even biological processes.

As it is, the Supplementary Materials are inconsistently indexed so a lot of insight you might get from the last 15 years of genomics or proteomics work is invisible.

I imagine this approach could work, especially for Open Access data?

by bonsai_spool

12/31/2025 at 1:57:12 PM

I just built something like this a week ago: https://github.com/eamag/papers2dataset

I wanted to find all cryoprotective agents that were tested at different temperatures, but it should be extandable to your problem too. Uses OpenAlex to traverse a citation graph and open access pdfs

by eamag

12/31/2025 at 4:38:44 PM

This is a pretty cool project! Thank you for open sourcing it!

by jcmoscon

12/31/2025 at 8:41:34 PM

Guys, you obviously cannot suggest that —dangerously-skip-permissions is ok here, especially in the same paragraph as “even if you are not a software engineer”. This is untrusted text from the Internet, it surely contains examples of prompt injection.

You need to sandbox Claude to safely use this flag. There are easy to use options for this.

by theptip

12/31/2025 at 8:51:53 PM

Today I finally got Claude working in a devcontainer, so I'm wondering what the easier options are.

by skybrian

12/31/2025 at 10:28:54 PM

Things like https://github.com/textcortex/claude-code-sandbox seem like the bare minimum. There are a few other projects doing this.

The first threat is making edits to arbitrary files, exfiltrating your SSL keys or crypto wallets. A container solves that by not mounting your sensitive files.

The second threat would be if Claude gets fully owned and really tries to hack out of its container, in which case theoretically docker might not protect you. But that seems quite speculative.

by theptip

12/31/2025 at 9:09:40 PM

Yeah, I don't think there are easier options. And getting it working within a dev container with all the right settings, was more of a chore than it should be.

by dcreater

12/31/2025 at 9:19:07 PM

Don't completely rely on devcontainer, jailbreaking containers is something that Claude at least nominally knows how to do, though it seems like it's pretty strongly moralized not to without some significant prompt hacking.

by jaggederest

12/31/2025 at 11:20:09 AM

I think a prompt + an external dataset is a very simple distribution channel right now to explore anything quickly with low friction. The curl | bash of 2026

by nielsole

12/31/2025 at 2:37:12 PM

Exactly. Prompt + Tool + External Dataset (API, file, database, web page, image) is an extremely powerful capability.

by skapadia

12/31/2025 at 10:45:49 AM

> a state-of-the-art research tool over Hacker News, arXiv, LessWrong, and dozens

what makes this state of the art?

by kburman

12/31/2025 at 2:07:21 PM

It's just marketing.

It is not a protected term, so anything is state-of-the-art if you want it to be.

For example, Gemma models at the moment of release were performing worse their competition, but still, it is "state-of-the-art". It does not mean it's a bad product at all (Gemma is actually good), but the claims are very free.

Juicero was state-of-the-art on release too, though hands were better, etc.

by rvnx

12/31/2025 at 3:10:22 PM

> It's just marketing. [...] It is not a protected term, so anything is state-of-the-art if you want it to be.

But is it true?

I think we ought to stop indulging and rationalizing self-serving bullshit with the "it's just marketing" bit, as if that somehow makes bullshit okay. It's not okay. Normalizing bullshit is culturally destructive and reinforces the existing indifference to truth.

Part of the motivation people have seems to be a cowardly morbid fear of conflict or the acknowledgment that the world is a mess. But I'm not even suggesting conflict. I'm suggesting demoting the dignity of bullshitters in one's own estimation of them. A bullshitter should appear trashy to us, because bullshitting is trashy.

by lo_zamoyski

12/31/2025 at 4:49:52 PM

I would vote for you as dictator.

by docjay

12/31/2025 at 9:34:12 PM

If my comments were only state of the art I wouldn't need to write them.

by econ

12/31/2025 at 2:53:32 PM

just like "cruelty free" and "not tested on animals" in usa

by goopypoop

12/31/2025 at 11:24:43 AM

The scale. How many tools do you know that can query the content of all arxiv papers.

by 7moritz7

12/31/2025 at 6:23:04 PM

Doesn't look like the scale is there, even for HN:

> Currently have embedded: posts: 1.4M / 4.6M comments: 15.6M / 38M That's with Voyage-3.5-lite

by eamag

1/1/2026 at 3:11:29 AM

The scale is there. I'm scraping, cleaning, token efficientizing dozens of sources every single hour. The lack of monies for embedding everything was a temporary problem.

by Xyra

1/1/2026 at 3:01:31 AM

in the direction of "empowering the public with new capabilities they didn't have before", Scry offers, with the copy and paste of a prompt and talking with an agent:

1) Full readonly-SQL + vector manipulation in a live public database. Most vector DB products expose a much narrower search API. Basically only a few enterprise level services let you run arbitrary SQL on remote machines. Google BigQuery gives users SQL power, but it mostly doesn't have embeddings, connect public corpora, have as good of indexes, and doesn't have support an agentic research experience. Beyond object-level research, Scry a good tool for exploring and acquiring intuitions about embedding-space.

2) An agent-native text-to-SQL + lexical + semantic deep research workflow. We have a prompt that's been heavily optimized for taking full advantage of our machine and Claude Code for exploration and answering nuanced questions. Claude fires off many exploratory queries and builds towards really big queries that lean on the SQL query planner. You can interrupt at any time. You have the compute limits to do lots of exhaustive exploration--often more epistemically powerful than finding a document often, is being confident than one doesn't exist.

3) dozens of public commons in one database, with embeddings.

by Xyra

12/31/2025 at 11:07:01 AM

The tool is state of the art, the sources are historical.

by nandomrumber

12/31/2025 at 11:07:06 AM

First, so best in this?

by ashirviskas

12/31/2025 at 9:17:54 PM

"intelligence explosion", "are essentially AGI at this point", "ARBITRARY SQL + VECTOR ALGEBRA" etc. Casual use of hyperbole and technical jargon.

my charlatan radar is going off.

by dcreater

12/31/2025 at 10:02:24 PM

What is hyperbole? We are collectively experiencing a software intelligence explosion (people are shipping good software at prolific rates now due to Opus 4.5 and GPT-5.2-Codex-xhigh). With Scry, you can run arbitrary SELECT SQL statements over a large corpus and have an easier time composing embedding vectors in whatever mathematical ways you want, than any other tool I've seen.

by Xyra

12/31/2025 at 10:40:48 PM

> shipping good software at prolific rates

I think your definition of good needs to be rethought

by dcreater

1/1/2026 at 3:58:04 PM

“The primary job of software engineers is to make software suck less.” - a university professor i had, 20 years ago.

Let’s not romanticize the past because it’s easier to ship (probably still buggy) code today.

by liquidise

1/2/2026 at 12:05:45 AM

its not though. the primary job of software engineers is to ship a product that produces income for their employer.

by physPop

1/7/2026 at 12:05:11 AM

Does this definition work for SWEs who develop open source projects?

by dcreater

12/31/2025 at 9:37:22 AM

Really useful currently working on a autonomous academic research system [1] and thinking about integrating this. Currently using custom prompt + Edison Scientific API. Any plans of making this open source?

[1] https://github.com/giatenica/gia-agentic-short

by 7777777phil

12/31/2025 at 7:33:56 PM

I could make it open-source as soon as I have $5k to my name. I've been in survival mode frankly for a long time.

by Xyra

1/1/2026 at 1:08:04 AM

Maybe more actually, server costs and API credits for my agent-coordination research are expensive.

by Xyra

1/1/2026 at 7:39:27 PM

I'm raising at least $175k and doing a serious startup.

by Xyra

12/31/2025 at 9:40:08 AM

That's just not a good use of my Claude plan. If you can make it so a self-hosted Lllama or Qwen 7B can query it, then that's something.

by nineteen999

12/31/2025 at 8:23:42 PM

If you're not willing to pay for your own LLM usage to try a free resource offered by the author, that's up to you. But why complain to the author about it? How does your comment enrich the conversation for the rest of us?

by panarky

12/31/2025 at 11:07:47 PM

It's not free if I have to expend Claude credits on something a locally hosted Qwen 7B could handle.

> How does your comment enrich the conversation for the rest of us?

Straight back at you.

by nineteen999

12/31/2025 at 4:37:44 PM

It's ultimately just a prompt, self-hosted models can use the system the same way, they just might struggle to write good SQL+vector queries to answer your questions. The prompt also works well with Codex, which has a lot of usage.

by Xyra

12/31/2025 at 10:56:38 AM

I think that’s just a matter of their capabilities, rather than anything specific to this?

by mcintyre1994

12/31/2025 at 6:47:39 PM

This is very cool. If you're productizing this you should try to target a vertical. What does "literally don't have the money" mean? You should try to raise some in the traditional way. If nothing else works, at least try to apply to YC.

by arjie

12/31/2025 at 7:29:30 PM

I mean I've been living off of $1700/month for a while in Berkeley. I have been trying hard the last 6 weeks to raise angel investment, and am moving to Thailand in a few days to have more breathing room (and change things up to untie some emotional knots and try to make sure I'm positioned to vibe-engineer as well as possible over the next few months).

by Xyra

12/31/2025 at 7:50:42 PM

You don't have any personal contact information on your website or on your Hacker News profile. For a tiny check size, I can be an angel. Contact in profile. Would you like to meet before you leave? I think you shouldn't move out of the Bay Area.

by arjie

12/31/2025 at 8:06:24 PM

That sounds great, thanks, I emailed you.

by Xyra

12/31/2025 at 9:06:40 PM

I've got some idle servers in my basement in Bulgaria with lots of GPUS. I'm actually in Cambodia at the moment. I've actually been playing with some similar ideas. Message me if you like. :)

by davidzweig

1/1/2026 at 12:23:56 PM

Thailand is a dark place. Beware!

There are a lot of other low cost countries out there!

by gregw2

1/2/2026 at 12:40:17 AM

It's literally the digital nomad heaven. What's dark about it?

by alpb

1/3/2026 at 2:13:25 PM

Fair.

I acknowledge "dark" is a judgemental term... but the mix of extreme poverty, extreme relative wealth, and the blind eye towards the sex trade is... dark.

Such misery is not unique to Thailand but you may find it more open, deeply rooted, in your face palpable, or covert-in-troubling-ways.

If you are doing serious dev work of a leveragable nature, I would also be6 thoughtful about how to protect one's innovations in a heavenly land adjacent to China, full of friendly Russian expat hackers post-Ukraine-sanctions, with my hinkiness detectors already overwhelmed by cross cultural signals of a new environment.

I could try to sell you on the merits of low cost of living for English-speaking software hackers in other places like Vietnam or the Phillipines but have to remind myself you aren't asking for that and all I really have is anecdotes and observations anyways and so much of our options and choices are shaped by circumstances and personal tradeoffs. I wouldn't do it but I am me, not you. Good luck!

by gregw2

12/31/2025 at 7:48:26 PM

just a recommendation, pubmed is free and not limited to preprints

by biophysboy

12/31/2025 at 10:04:47 PM

Thank you, I've started ingestion operations of pubmed.

by Xyra

12/31/2025 at 9:40:58 AM

Nice, but would you consider open-sourcing it? I (and I assume others) are not keen on sharing my API keys with a 3rd party.

by mentalgear

12/31/2025 at 10:59:31 AM

I think you misunderstood. The API key is for their API, not Anthropic.

If you take a look at the prompt you'll find that they have a static API key that they have created for this demo ("exopriors_public_readonly_v1_2025")

by nielsole

12/31/2025 at 8:24:29 PM

Yes, thanks for explaining it.

by Xyra

12/31/2025 at 11:32:53 AM

The quick setup is cool! I’ve not seen this onboarding flow for other tools, and I quite like its simplicity.

by m11a

1/1/2026 at 1:25:43 AM

Thank you!

by Xyra

12/31/2025 at 9:31:09 AM

Seems very cool, but IMO you’d be better off doing an open source version and then hosted SAAS.

by bugglebeetle

12/31/2025 at 4:58:43 PM

Would you mind walking through the logic of that a bit for me? I'm definitely interested in productizing this, and would be interested in open sourcing as soon as I have breathing room (I have no money).

by Xyra

12/31/2025 at 4:19:25 PM

Anyone tried to use these prompts with Gemini 3 Pro? it feels like Claude, Gemini and GPT latest offerings are on par (excluding costs) and as a developer if you know how to query/spec a coder llm you can move between them at ease.

by lastdong

12/31/2025 at 6:59:21 PM

Claude Opus 4.5 is a paradigm shift

by awestroke

1/10/2026 at 1:31:03 AM

Can I make an offline mirror of this?

by lostmsu

12/31/2025 at 4:53:00 PM

Seems like you're experiencing the hacker news hug of death.

by anonfunction

12/31/2025 at 5:18:35 PM

Should be squared away now! Was my fault missing a health check for a recent weird bug, not a load issue.

by Xyra

12/31/2025 at 5:23:45 PM

The console / login pages are showing an error still.

by anonfunction

12/31/2025 at 6:53:52 PM

I could be distributed as a Claude skill. Internally, we've bundled a lot of external APIs and SQL queries into skills that are shared across the company.

by r--w

12/31/2025 at 9:14:42 PM

Not a software engineer. Isnt allowing network egress a security risk? exopriors.com is not an established domain or brand that warrants the trust its asking

by dcreater

12/31/2025 at 2:09:15 PM

this is great>>@FTX_crisis - (@guilt_tone - @guilt_topic)

Using LLm for tasks that could be done faster with traditional algorithmic approaches seems wasteful, but this is one of the few legitimate cases where embeddings are doing something classical IR literally cannot. You could also make make the LLM explain the query it’s about to run. Before execution:

“Here’s the SQL and semantic filters I’m about to apply. Does this match your intent?”

by voxleone

12/31/2025 at 4:45:47 PM

Great idea! I just overhauled the prompt to explain the SQL + semantic filters better, and give the user clearer adjustment opportunities before long-running queries.

by Xyra

1/1/2026 at 1:29:49 PM

What’s the benefit of manually pasting a massive prompt and enable egress to make queries over http vs just using MCP?

by Too

1/1/2026 at 12:04:57 PM

Looks great, thanks for sharing! Out of interest, how long did this take to get to its current state?

by rglynn

1/1/2026 at 8:11:19 PM

Thank you! I got the idea December 3, and initially released it December 19.

by Xyra

12/31/2025 at 6:20:51 PM

Do you have contact information? Would like to discuss sponsoring further work and embedding here.

by legohorizons

12/31/2025 at 7:40:42 PM

That would be amazing! Yes, contact@exopriors.com.

by Xyra

12/31/2025 at 5:04:38 PM

It's a very nifty cool, and could definitely come in handy. love the UX too!

by darlontrofy

1/1/2026 at 1:27:27 AM

Thank you! I'll be getting millions more quality, embedded documents, it'll be here just getting more useful.

by Xyra

12/31/2025 at 10:00:39 AM

Is the appeal of this tool its ability to identify semantic similarity?

by gtsnexp

12/31/2025 at 11:35:10 AM

The use case could vary from person to person. When you think about it, hacker news has large enough data set ( and one that is widely accessible ) to allow all sorts of fun analyses. In a sense, the appeal is:

who knows what kind of fun patterns could emerge

by A4ET8a8uTh0_v2

12/31/2025 at 1:49:20 PM

The problem with HN isn't that the patterns are hard to discern, it's that no one wants to acknowledge them.

by noduerme

12/31/2025 at 3:33:47 PM

Oh? With few exceptions, I found people more willing to agree to an argument than anywhere else. Anything in particular you can share?

by A4ET8a8uTh0_v2

12/31/2025 at 11:00:36 PM

How is the alerts functionality implemented?

by pcloadlett3r

1/1/2026 at 1:10:36 AM

You submit a SQL query to periodically run, we run it and store the results. As we ingest more documents (dozens of sources are being ingested every day), we run it again. If there's different outputs, you get an email.

by Xyra

1/2/2026 at 6:27:12 PM

wondering what is your stack? What SQL database are you using?

by riku_iki

1/3/2026 at 3:25:07 AM

Hetzner, Postgres, Rust, SvelteKit

by Xyra

12/31/2025 at 2:53:57 PM

Does that first generated query really work? Why are you looking at URIs like that? First you filter for a uri match, then later filter out that same match, minus `optimization`, when you are doing the cosine distance. Not once is `mesa-optimization` even mentioned, which is supposed to be the whole point?

by beepbooptheory

12/31/2025 at 8:32:24 PM

I've since improved it, and also discovered a new method of vector composition I have added as a first-class primitive:

debias_vector(axis, topic) removes the projection of axis onto topic: axis − topic * (dot(axis, topic) / dot(topic, topic))

That preserves the signal in axis while subtracting only the overlap with topic (not the whole topic). It’s strictly better than naive subtraction for “about X but not Y.”

by Xyra

12/31/2025 at 7:07:32 PM

I need to try this

by lasgawe

1/3/2026 at 3:19:19 AM

What did you think?

by Xyra

1/1/2026 at 10:41:26 AM

[dead]

by marbro

12/31/2025 at 10:05:48 AM

"Claude Code and Codex are essentially AGI at this point"

Okaaaaaaay....

by octoberfranklin

12/31/2025 at 11:13:55 AM

Just comes down to your own view of what AGI is, as it's not particularly well defined.

While a bit 'time-machiney' - I think if you took an LLM of today and showed it to someone 20 years ago, most people would probably say AGI has been achieved. If someone wrote a definition of AGI 20 years ago, we would probably have met that.

We have certainly blasted past some science-fiction examples of AI like Agnes from The Twilight Zone, which 20 years ago looked a bit silly, and now looks like a remarkable prediction of LLMs.

By todays definition of AGI we haven't met it yet, but eventually it comes down to 'I know it if I see it' - the problem with this definition is that it is polluted by what people have already seen.

by Closi

12/31/2025 at 1:02:56 PM

> most people would probably say AGI has been achieved

Most people who took a look at a carefully crafted demo. I.e. the CEOs who keep pouring money down this hole.

If you actually use it you'll realize it's a tool, and not a particularly dependable tool unless you want to code what amounts to the React tutorial.

by nottorp

12/31/2025 at 7:34:09 PM

I built a Nostr web client without looking at code or touching the IDE with Gemini CLI: https://github.com/lucianmarin/subnostr

by lcnmrn

12/31/2025 at 8:51:49 PM

So it had a tutorial for that api and it reimplemented it

by nottorp

12/31/2025 at 2:14:08 PM

Depending on the task, the tool can, in effect, demonstrate more intelligence than most people.

We've just become accustomed to it now, and tend to focus more on the flaws than the progress.

by bebb

12/31/2025 at 11:23:49 AM

> If someone wrote a definition of AGI 20 years ago, we would probably have met that.

No, as long as people can do work that a robot cannot do, we don't have AGI. That was always, if not the definition, at least implied by the definition.

I don't know why the meme of AGI being not well defined has had such success over the past few years.

by bananaflag

12/31/2025 at 12:45:06 PM

"Someone" literally did that (+/- 2 years): https://link.springer.com/book/10.1007/978-3-540-68677-4

I think it was supposed to be a more useful term than the earlier and more common "Strong AI". With regards to strong AI, there was a widely accepted definition - i.e. passing the Turing Test - and we are way past that point already: ( see https://arxiv.org/pdf/2503.23674 )

by bonplan23

12/31/2025 at 5:27:21 PM

I have to challenge the paper authors' understanding of the Turing test. For an AI system to pass the Turing test its output needs to be indistinguishable from a human's. In other words, the rate of picking the AI system as human should be equal to the rate of picking the human. If in an experiment the AI system is picked at a rate higher than 50% it does not pass the Turing test (as the authors seem to believe) because another human can use this knowledge to conclude that the system being picked is not really human.

Also, I would go one step further and claim that to pass the Turing test an AI system should be indistinguishable from a human when judged by people trained in making such a distinction. I doubt that they used such people in the experiment.

I doubt that any AI system available today, or in the foreseeable future, can pass the test as I qualify it above.

by erfgh

12/31/2025 at 5:56:20 PM

People are constantly being fooled by bots in forums like Reddit and this one. That's good enough for me to consider the Turing test passed.

It also makes me consider it an inadequate test to begin with, since all classes of humans including domain experts can be fooled and have been in the past. The Turing test has always said more about the human participants than the machine.

by CamperBob2

12/31/2025 at 11:30:56 AM

Completely disagree - Your definition (in my opinion) is more aligned to the concept of Artificial Super Intelligence.

Surely the 'General Intelligence' definition has to be consistent between 'Artificial General Intelligence' and 'Human General Intelligence', and humans can be generally intelligent even if they can't solve calculus equations or protein folding problems. My definition of general intelligence is much lower than most - I think a dog is probably generally intelligent, although obviously in a different way (dogs are obviously better at learning how to run and catch a ball, and worse at programming python).

by Closi

12/31/2025 at 1:29:59 PM

I do consider dogs to have "general intelligence" however despite that I have always (my entire life) considered AGI to imply human level intelligence. Not better, not worse, just human level.

It gets worse though. While one could claim that scoring equivalently on some benchmark indicates performance at the same level - and I'd likely agree - that's not what I take AGI to mean. Rather I take it to mean "equivalent to a human" so if it utterly fails at something we're good at such as driving a car through a construction zone during rush hour then I don't consider it to have met the bar of AGI even if it meets or exceeds us at other unrelated tasks. You have to be at least as general as a stock human to qualify as AGI in my books.

Now I may be but a single datapoint but I think there are a lot of people out there who feel similarly. You can see this a lot in popular culture with AGI (or often AI) being used to refer to autonomous humanoid robots portrayed as operating at or above a human level.

Related to all that, since you mention protein folding. I consider that to be a form of super intelligence as it is more or less inconceivable that an unaided human would ever be able to accomplish such a feat. So I consider alphafold to be both super intelligent and decidedly _not_ AGI. Make of that what you will.

by fc417fc802

12/31/2025 at 7:51:09 PM

Pop culture has spent its entire existence conflating AGI and ‘Physical AI’, so much so that the collective realization that they’re entirely different is a relatively recent thing. Both of them were so far off in the future that the distinction wasn’t worth considering, until suddenly one of them is kinda maybe sorta roughly here now…ish.

Artificial General Intelligence says nothing about physical ability, but movies with the ‘intelligence’ part typically match it with equally futuristic biomechanics to make the movie more interesting. AGI = Skynet, Physical AI = Terminator. The latter will likely be the hardest part, not only because it requires the former first, but because you can’t just throw more watts at a stepper motor and get a ballet dancer.

That said, I’m confident that if I could throw zero noise and precise “human sensory” level sensor data at any of the top LLM models, and their output was equally coupled to a human arm with the same sensory feedback, that it would definitely outdo any current self-driving car implementation. The physical connection is the issue, and will be for a long time.

by docjay

12/31/2025 at 8:38:24 PM

Agreed about the conflation. But that drives home that there isn't some historic commonly and widely accepted definition for AGI whose goal posts are being moved. What there was doesn't match the new developments and was also often quite flawed to begin with.

> LLM models, ... outdo any current self-driving car

How would an LLM handle computer vision? Are you implicitly including a second embedding model there? But I think that's still the wrong sort of vision data for precise control, at least in general.

How do you propose to handle the model hallucinating? What about losing its train of thought?

by fc417fc802

1/1/2026 at 5:26:03 AM

True that there isn’t a firm definition for AGI, but that’s the fault of the “I”. We don’t have an objective definition of intelligence, and so we don’t have a means of measuring it either. I mean, odds are you’re the least intelligent paleoethnobotanist and cetacean bioacoustician I’ve ever met, but perhaps the most intelligent something_else. How do we measure that? How do we define it?

I was confusing in my previous message. Right now it would be terrible at driving a car, but I was saying that has more to do with the physical interface (camera, sensors, etc) than the ability of an LLM. The ‘intelligence’ part is better than the PyTorch image recognition attached to a servo they’re using now, how to attach that ‘intelligence’ to the physical world is the 50 year task. (To be clear: LLMs aren’t intelligent, smart, or any sense of the word and never will be. But they can sure replicate the effect better than current self-driving tech.)

by docjay

12/31/2025 at 8:12:59 PM

I think your definition of it being 'human level' is sensible - definitely a lower bar to hit than 'as long as people can do work that a robot cannot do, we don't have AGI'.

There is certainly a lot road between current technology and driving a car through a construction zone during rush hour, particularly with the same amount of driving practice a human gets.

Personally I think there could be an AGI which couldn't drive a car, but has genuine sentience - an awareness of being alive, although not necessarily the exact human experience. Maybe this isn't AGI, which more implies problem-solving and thinking rather than sentience, but in my gut if we got something sentient but that couldn't drive a car, we would still be there if that makes sense?

by Closi

12/31/2025 at 8:48:21 PM

In theory I see what you're saying. There are physical things an octopus could conceivably do that I never could on account of our physiology rather than our intelligence. So you can contrive an analogous scenario involving only the mind where something that is clearly an AGI is incapable of some specific task and thus falls short of my definition. This makes it clear that my definition is a heuristic rather than rigorous.

Nonetheless, it's difficult to imagine a scenario where something that is genuinely human level can't adapt in the field to a novel task such as driving a car. That sort of broad adaptability is exactly what the "general" in AGI is attempting to capture (imo).

by fc417fc802

1/1/2026 at 12:04:11 AM

This is true, although maybe if an “AGI” invented us, it might say “It’s strange how these humans are so good at driving, but so bad at protein folding and playing Go”

Very abstract, but I think it’s important to remember that human intelligence also has jagged edges.

by Closi

12/31/2025 at 4:22:30 PM

  I think if you took an LLM of today and showed it to someone 20 years ago, most people would probably say AGI has been achieved.

I’ve got to disagree with this. All past pop-culture AI was sentient and self-motivated, it was human like in that it had it’s own goals and autonomy.

Current AI is a transcript generator. It can do smart stuff but it has no goals, it just responds with text when you prompt it. It feels like magic, even compared to 4-5 years ago, but it doesn’t feel like what was classically understood as AI, certainly by the public.

Somewhere marketers changed AGI to mean “does predefined tasks with human level accuracy” or the like. This is more like the definition of a good function approximator (how appropriate) instead of what people think (or thought) about when considering intelligence.

by andy99

12/31/2025 at 8:14:48 PM

The thing that blows my mind about language models isn't that they do what they do, it's that it's indistinguishable from what we do. We are a black box; nobody knows how we do what we do, or if we even do what we do because of a decision we made. But the funny thing is: if I can perfectly replicate a black box then you cannot say that what I'm doing isn't exactly what the black box is doing as well.

We can't measure goals, autonomy, or consciousness. We don't even have an objective measure of intelligence. Instead, since you probably look like me I think it's polite to assume you're conscious…that's about it. There’s literally no other measure. I mean, if I wanted to be a jerk, I could ask if you're conscious, but whether you say yes or no is proof enough that you are. If I'm curious about intelligence I can come up with a few dozen questions, out of a possible infinite number, and if you get those right I'll call you intelligent too. But if you get them wrong… well, I'll just give you a different set of questions; maybe accounting is more your thing than physics.

So, do you just respond with text when you’re promoted with input from your eyes or ears? You’ll instinctively say “No, I’m conscious and make my own decisions”, but that’s just a sequence of tokens with a high probability in response to that question.

Do you actually have goals, or did the system prompt of life tell you that in your culture, at this point in time, you should strive to achieve goals[] because that’s what gets positive feedback?

by docjay

12/31/2025 at 10:08:44 PM

Your argument makes no sense

by andy99

1/1/2026 at 1:28:34 PM

It's a straight forward argument and he presented it fairly clearly so...

Maybe this will help you: https://en.wikipedia.org/wiki/Philosophical_zombie

The hard nut to crack here is nobody has am empirical test for the subjective experience of consciousness. A machine which actually possesses it, and a machine which merely emulates it and answers questions as if it has that subjective experience cannot be distinguished using any empirical test. That includes people; it's simply a matter of common courtesy and pragmatism that we assume other people have comparable subjective conscious experiences (aka they aren't p-zombies.)

by mikkupikku

1/1/2026 at 1:21:51 AM

Well then keep working on it.

by docjay

1/1/2026 at 1:15:12 PM

> All past pop-culture AI was sentient and self-motivated, it was human like in that it had it’s own goals and autonomy.

I have to strongly disagree with you here. This was absolutely not the case in a very large amount of science fiction media, particular in the 20th century. AIs / robots were often depicted of automatons with no self-agency, no goal setting of their own, who were usually capable of understanding and following complex orders issued in natural language (but which frequently misunderstood orders in ways humans find surprising, leading to a source of conflict.)

Almost all of Asimov's robots are like this, there are a handful of counter examples, but for the most part his robots are p-zombies that mis-follow orders.

Nonhsentient AI with no personal motivation also frequently comes up in situations where the machine is built to be an impartial judge, for instance in The Demolished Man, all criminal prosecutions need to persuade a computer which does nothing but evaluate evidence and issue judgments.

Non-sentient AIs also show up often in ship-board computers. Examples are Mother in Alien, and the Computer in at least most of Star Trek (I'm no Trekkie, so forgive me for missing counter examples and nuance, technology in that show does whatever the writers needed.)

Even the droids in Star Wars, do they ever really execute agency over their own lives? They have no apparent life goals or plans, they're along for the ride, appliances with superficial personalities.

In The Hitchhiker's Guide to the Galaxy, does Deep Thought actually have self-agency? I only recall it thinking hard about the questions posed to it, and giving nonsensical answers which miss the obvious intent of the question, causing more trouble than any of it was worth.

Ghost in the Shell; obviously has sentient AIs, but in that setting these are novel and surprising, most androids in that are presumed to be just machines with dumb programming and it's only the unexpected emergence of more complicated systems that prompt the philosophizing.

by mikkupikku

1/1/2026 at 3:59:17 PM

I think we’re looking at the same thing in different ways. But regardless I don’t know think a valid interpretation of classical how AI was classically depicted is as a transcript generator or an extension thereof. There’s still some notion of taking action on its own (even if it’s according to a rigid set of principles and literal interpretation of a request like an Asimov robot) that is not present in LLMs and cannot be.

by andy99

12/31/2025 at 5:13:11 PM

> Current AI is a transcript generator. It can do smart stuff but it has no goals

That's probably not because of an inherent lack of capability, but because the companies that run AI products don't want to run autonomous intelligent systems like that

by nextaccountic

12/31/2025 at 1:04:53 PM

Charles Stross published Accelerando in 2005.

The book is a collection of nine short stories telling the tale of three generations of a family before, during, and after a technological singularity.

by sixtyj

12/31/2025 at 10:53:15 AM

I want to know what the "intelligence explosion" is, sounds much cooler than AGI.

by phatfish

12/31/2025 at 11:09:32 AM

When AI gets so good it can improve on itself

by adammarples

12/31/2025 at 1:00:15 PM

Actually, this has already happened in a very literal way. Back in 2022, Google DeepMind used an AI called AlphaTensor to "play" a game where the goal was to find a faster way to multiply matrices, the fundamental math that powers all AI.

To understand how big this is, you have to look at the numbers:

The Naive Method: This is what most people learn in school. To multiply two 4x4 matrices, you need 64 multiplications.

The Human Record (1969): For over 50 years, the "gold standard" was Strassen’s algorithm, which used a clever trick to get it down to 49 multiplications.

The AI Discovery (2022): AlphaTensor beat the human record by finding a way to do it in just 47 steps.

The real "intelligence explosion" feedback loop happened even more recently with AlphaEvolve (2025). While the 2022 discovery only worked for specific "finite field" math (mostly used in cryptography), AlphaEvolve used Gemini to find a shortcut (48 steps) that works for the standard complex numbers AI actually uses for training.

Because matrix multiplication accounts for the vast majority of the work an AI does, Google used these AI-discovered shortcuts to optimize the kernels in Gemini itself.

It’s a literal cycle: the AI found a way to rewrite its own fundamental math to be more efficient, which then makes the next generation of AI faster and cheaper to build.

https://deepmind.google/blog/discovering-novel-algorithms-wi... https://www.reddit.com/r/singularity/comments/1knem3r/i_dont...

by peheje

12/31/2025 at 3:03:47 PM

This is obviously cool, and I don't want to take away from that, but using a shortcut to make training a bit faster is qualitatively different from producing an AI which is actually more intelligent. The more intelligent AI can recursively produce a more intelligent one and so on, hence the explosion. If it's a bit faster to train but the same result then no explosion. It may be that finding efficiencies in our equations is low hanging fruit, but developing fundamentally better equations will prove impossible.

by adammarples

1/1/2026 at 7:12:45 AM

Agreed. This is a small step :) And humans still definitely in the loop.

by peheje

12/31/2025 at 3:14:41 PM

s/improve itself/explode itself/

by lo_zamoyski

12/31/2025 at 10:27:16 AM

I have noticed that Claude users seem to be about as intelligent as Claude itself, and wouldn't be able to surpass its output.

by Hamuko

12/31/2025 at 1:55:28 PM

This made me laugh. Unfortunately, this is the world we live in. Most people who drive cars have no idea how they work, or how to fix them. And people who get on airplanes aren't able to flap their arms and fly.

Which means that humans are reduced to a sort of uselessness / helplessness, using tools they don't understand.

Overall, no one tells Uncle Bob that he doesn't deserve to fly home to Minnesota for Christmas because he didn't build the aircraft himself.

But we all think it.

by noduerme

12/31/2025 at 1:20:04 PM

You, of course, are smarter than them.

by fragmede

12/31/2025 at 2:04:10 PM

You seem to be very confused about what intelligence even is.

by baq

1/1/2026 at 3:49:47 AM

If you’re not confused about what intelligent even is you’re lying.

by nandomrumber

12/31/2025 at 7:21:44 PM

lots of highfalutin language trying to make something thats pretty hand wavy look like it's not. Where are the benchmarks? The "vector algebra" framing with @X + @Y - @Z is a falsehood. Embedding spaces don't form any meaningful algebraic structure (ring, field, etc.) over semantic concepts, you're just getting lucky by residual effects.

by bfeynman

12/31/2025 at 7:57:09 PM

I'm giving you, the user, the easiest ability you've most likely ever had to explore embedding space yourself. Embeddings are tricky and can mislead, but they do often compose surprisingly intuitively, especially when you've played and built up a bit of an intuition for it.

by Xyra

12/31/2025 at 8:14:10 PM

What is the impact of misleading embeddings, how do they compose? I honestly am interested but don't know enough to understand what you're saying.

Why would I want to explore the embedding space myself, isn't this a tool where I can run cross-data exploratory analyses against unstructured data, where it's pre-populated with content?

by edmundsauto

1/1/2026 at 8:28:54 PM

We can iterate fast with understanding useful paradigms of vector manipulation. Yesterday I added `debias_vector(axis, topic)` and l2_normalization guidance.

by Xyra

1/2/2026 at 2:27:38 AM

The manifold structure of embedding spaces isn't semantically uniform, you've found a nice little novelty thing but it's not rigorous, and using AI slop to name this vector algebra instead of finding or running a benchmark to show that its actually works better.

by bfeynman