alt.hn

4/2/2026 at 10:42:23 AM

Enabling Codex to Analyze Two Decades of Hacker News Data

https://modolap.com/publication/hn-analysis-1

by ronfriedhaber

4/2/2026 at 1:15:03 PM

I've done this kind of thing many times with codex and sqlite, and it works very well. It's one prompt that looks something like this:

- inspect and understand the downloaded data in directory /path/..., then come up with an sqlite data model for doing detailed analytics and ingest everything into an sqlite db in data.sqlite, and document the model in model.md.

Then you can query the database adhoc pretty easily with codex prompts (and also generate PDF graphs as needed.)

I typically use the highest reasoning level for the initial prompt, and as I get deeper into the data, continuously improve on the model, indexes, etc., and just have codex handle any data migration.

by zeroxfe

4/3/2026 at 3:25:52 PM

[dead]

by rizzo94

4/2/2026 at 2:30:02 PM

[dead]

by huflungdung

4/2/2026 at 3:01:07 PM

The “Hacker News - Complete Archive” on Hugging Face,[1] recently popped up here. “The data is stored as monthly Parquet files sorted by item ID, making it straightforward to query with DuckDB, load with the datasets library, or process with any tool that reads Parquet.”

Out of curiosity, I tinkered with it using Claude to see trends and patterns (I did find a few embarrassing things about me!).

1. https://huggingface.co/datasets/open-index/hacker-news

by Brajeshwar

4/2/2026 at 11:59:40 AM

I don't quite understand how Modolap differs from just asking AI to use any other OLAP engine? Both your website and the github readme just emphasise that it's idiosyncratic and your personal approach, without explaining what that is or why anyone should care.

by mike_hearn

4/2/2026 at 12:15:04 PM

Appreciate the feedback. I shall certainly revamp the README; it is rather stale.

> "how Modolap differs from just asking AI to use any other OLAP engine"

There presently exist two components, the OLAP query engine and the remote infrastructure service. The service enables systems like Codex (or developers as well) to manage datasets, maintain version control over queries, and offload the computational burden to dedicated machines. This is especially beneficial given the current trend of running agents inside micro-VMs.

In addition, it is designed with AI usage in mind. There is significant value in co-design. One could argue that models can use Polars or DuckDB just as well, and that there is no room for improvement, but I do not think this is true.

by ronfriedhaber

4/2/2026 at 1:40:28 PM

I don't get the value proposition either; your landing page is underdeveloped. Tracking the query history is trivial. Offloading computation could be done with Polars Cloud or MotherDuck. Can you expand on the "manage datasets" part?

by esafak

4/2/2026 at 1:13:40 PM

What room for improvement is there?

by bastawhiz

4/2/2026 at 7:13:58 PM

Could be interesting to chart quality of responses, toxicity/health of conversations, sentiment over time, impact of release of ChatGPT.

(since AI can now answer many questions that might have been topics of conversation; people can use AI to participate; people may be reluctant to participate if AI can data mine everything and link it back to them, etc. similar to Stack Overflow)

by RockyMcNuts

4/2/2026 at 3:17:54 PM

I'm kind of surprised that postgres was quite that dominated by mongodb back in the day. I remember the mongo fever, but I always thought postgres held reasonable market share. I guess it was other SQL dbs back then, I guess MySQL was still viable.

by sd9

4/2/2026 at 6:05:22 PM

It could be that Postgres was so popular that people didn't really discuss it.

Hyperbolic example; literally every human reading this consumes oxygen nearly every moment of the day, and as such no one talks about how great breathing is.

by tombert

4/2/2026 at 4:23:53 PM

I worked on many projects that had used wrongly mongo instead of ordinary relation database and they needed rework in time. It was just hyped in it's days. Like micro service architecture in few years.

by hsuduebc2

4/2/2026 at 2:05:07 PM

When searching for references to Go, what does it actually look for? "Go" is a relatively common word, and I hardly see anyone referring to it as Golang

by voidUpdate

4/2/2026 at 3:08:53 PM

That last chart showing the average comment length shows a clear negative downtrend, especially in recent months. I wonder why that is.

by hakrgrl

4/2/2026 at 3:09:01 PM

I noticed some topics and comments that were usually in violation of HN guidelines are no longer flagged, and discourse decays into reddit-like jabs and echo chambers. Only a small percent, but still, more than the previous 0% I was accustomed to.

Would be interesting to see how many comments violate the guidelines over time. https://news.ycombinator.com/newsguidelines.html

by hakrgrl

4/2/2026 at 4:27:00 PM

I really love codex. The price/value comparison to claude code is at least from my opinion much better.

by hsuduebc2

4/2/2026 at 9:39:27 PM

There's no doubt that $20 to OpenAI gets you more than $20 to Anthropic, the question is who's better? The bus to work could be free vs a car but there are cases where spending the extra money is simply just worth it.

by fragmede

4/3/2026 at 2:31:50 AM

Well sure. No doubt abouth that. I just think that most people, especially non tech people vibe coding apps are simply just burning money with latest open claw saga.

You simply do not need claude for simple tasks or simply scripts.

by hsuduebc2

4/2/2026 at 2:38:13 PM

Do not estimate/plot DAUs/MAUs, it's not a pretty picture :'(.

by moralestapia

4/2/2026 at 3:18:07 PM

Why do you say that?

by hakrgrl

4/2/2026 at 1:04:33 PM

HN data is open? Under what conditions it's distributed?

by throwaway290

4/2/2026 at 1:14:07 PM

There's an API link at the bottom of every page.

by bastawhiz

4/3/2026 at 3:54:39 AM

And? I'm talking about entire dataset. API is irrelevant unless you scrape it and because HN rate limits actual users I'm sure it doesn't like scraping.

by throwaway290

4/2/2026 at 4:26:09 PM

There are even datasets created by users over time and shared publicly.

by hsuduebc2

4/3/2026 at 3:55:26 AM

Is this one of them? It sounded like it's some sort of official dataset.

by throwaway290

4/2/2026 at 5:48:54 PM

no conditions! what, you didn't know your content would be free for LITERALLY EVERYONE when you made your account? and that you can't delete your comments? and it's a free for all? well, that's on you buddy, not HN, definitely not HN in any way

by GeoAtreides

4/2/2026 at 2:18:32 PM

5% of all comments mention Claude code?

Am I reading that right?

by xnorswap

4/2/2026 at 4:03:02 PM

I don't think it's a percentage of all comments. I think it's either the percent of articles (topics) posted to the site that are about Claude Code, or maybe a percentage of topics where at least one person mentioned Claude Code in the comments.

The latter seems easier to achieve. To borrow from another internet rule of thumb: "As an online discussion grows longer, the probability of someone mentioning Claude approaches one."

by SyneRyder

4/2/2026 at 4:15:02 PM

I think that latter interpretation must be right, it seems possible, if high.

Either raw comments or submissions doesn't seem credible.

by xnorswap

4/2/2026 at 2:27:21 PM

Well now it's 5.00001%.

by epaga

4/2/2026 at 4:49:18 PM

Nobody who actually codes in that language ever calls it 'Golang'

by sam-bee

4/2/2026 at 6:56:42 PM

I write Go for my day job and I refer to it as Golang all the time, depending on context. What are you getting at with this comment?

by Zambyte

4/2/2026 at 1:17:30 PM

[dead]

by avib99

4/2/2026 at 1:13:50 PM

[dead]

by benreesman

4/2/2026 at 2:29:43 PM

[dead]

by huflungdung

4/2/2026 at 5:29:31 PM

[flagged]

by cloudpeaklabs