SQLite JSON at full index speed using generated columns

12/12/2025 at 7:59:44 PM

It is also possible to encode JSON documents directly as a serialized B-tree. Then you can construct iterators on it directly, and query internal fields at indexed speeds. It is still a serialized document (possible to send over a network), though now you don't need to do any parsing, since the document itself is already indexed. It is called the Lite³ format.

Disclaimer: I am working on this.

https://github.com/fastserial/lite3

by eliasdejong

12/12/2025 at 9:21:00 PM

This is super cool! I've always liked Rkyv (https://rkyv.org) but it requires Rust which can be a big lift for a small project. I see this supports binary data (`lite3_val_bytes`) which is great!

by conradev

12/12/2025 at 9:32:51 PM

Thank you. Having a native bytes type is non-negotiable for any performance intensive application that cannot afford the overhead of base64 encoding. And yes, Rkyv also implements this idea of indexing serialized data. The main differences are:

1) Rkyv uses a binary tree vs Lite³ B-tree (B-trees are more cache and space efficient).

2) Rkyv is immutable once serialized. Lite³ allows for arbitrary mutations on serialized data.

3) Rkyv is Rust only. Lite³ is a 9.3 kB C library free of dependencies.

4) Rkyv as a custom binary format is not directly compatible with other formats. Lite³ can be directly converted to/from JSON.

I have not benchmarked Lite³ against Rust libraries, though it would be an interesting experiment.

by eliasdejong

12/12/2025 at 11:54:48 PM

That second point is huge – Rkyv does have limited support for in-place mutation, but it is quite limited!

If you added support for running jq natively, that would be very cool. Lite³ brings the B-trees, jq brings the query parser and bytecode, combined, you get SQLite :P

by conradev

12/13/2025 at 12:23:11 AM

Yes, in fact the name Lite³ was chosen because it is lighter than SQLite.

I thought about implementing something like jq or JSON query, and this is very possible. It is like sending a mini-database that can be queried at speeds thousands of times faster than any JSON library is able to parse.

One interesting effect of being a zero-copy format is that the 'parsing speed' can exceed the memory bandwidth of the CPU, since to fulfill a query you do not actually need to parse the entire dataset. You only walk the branches of the tree that are actually required.

I've talked to some other people that have also shown interest in this idea. There doesn't really seem to exist a good schemaless single-file format that supports advanced queries. There is only SQLite and maybe HDF5.

by eliasdejong

12/14/2025 at 1:50:49 AM

I would very much love that. I like `jq` itself as a standard, even though I don't know how well it maps. Areas where I'd want to use Lite³:

- Microcontrollers. I find myself reaching for https://github.com/siara-cc/sqlite_micro_logger_c/tree/maste... because SQLite is just too big

- Shared memory regions/mapped files. Use it to share state between processes. Could you make mutations across processes/threads lock-free?

- Caching GPU-friendly data (i.e. image cache). I'm not sure if the current API surface/structure is page alignment friendly

by conradev

12/14/2025 at 11:48:01 AM

In general jq maps very well to any hierarchical datastructure. One of the maintainers has made 'fq' which supports BSON, MsgPack, Protobuf, CBOR and even media files png, jpg, mp4 etc.

SQLite when compiled for size is 590 kB. But I think a full jq database implementation based on Lite³ would be possible under 100 kB.

Lock-free shared state relies on algorithms that can make clever use of atomic instructions. But you should not have threads write to the same memory regions, because the hardware only allows for 1 core to have a cacheline in a writeable state. If another core attempts a write to the same line, this will immediately invalidate all the other copies. Under high contention the coherency penalty becomes so large that throughput falls through the floor. So basically the algorithms need to do most of the work in separate memory regions, then occasionally coordinate by 'committing' their work via a spinlock or similar.

Lite³ implements some settings for node alignment, but not for user data. It would be possible to create a bytes type with extra alignment guarantees.

by eliasdejong

12/13/2025 at 2:05:47 AM

Building a jq around something like Lite^3 or JSONB is a very appealing thought.

by cryptonector

12/13/2025 at 3:18:56 AM

1) when did they downgrade? I've stared for hours at that particular code...

2) no you just don't get to move data freely.

3) I don't believe JSON has any place in a system that needs C because it can't handle Rust.

4) JSON can't handle non-tree structures, it's further very limited in expressivity. Rkyv is more of a code gen akin to ASN.1

Happy benchmarking, feel free to use the rkyv benchmark tooling and ensure you have enough link time optimization going on.

by namibj

12/16/2025 at 8:05:15 PM

(I made rkyv) I took a look at lite3/tron because I was interested in seeing how it approached some of the issues I ran into with rkyv. I ended up with some additional information related to points 1 and 2 I figured I'd write down:

1) You're correct, rkyv uses a B-tree with a default branching factor of 6 [1]. I don't think I've ever implemented a binary tree in any version of rkyv - maybe one of the earliest versions used a sorted vec instead of a B-tree? I hate the B-tree map implementation because it is so complicated, even though it actually used to be even worse in 0.7 when it was implemented as a B+ tree. Sorry for making you read any of it!

2) Arbitrary mutations would definitely be an improvement on rkyv, and I was most interested to see what kind of approach lite3/tron took! Unfortunately growing the buffer is not supported [2] and freed memory isn't reclaimed [3]. These are the two actually difficult problems standing in the way of rkyv getting more comprehensive mutation support, so unfortunately I think there's nothing valuable to take away here.

The issue with growing the buffer is that you invalidate any pointers into the buffer and have to manually fix them up (e.g. by re-crawling the buffer). There's not a decent approach I'm aware of which manages these pointer fixups in a way that limits complexity and avoids really nasty API surfaces.

The issue with memory reclamation is that you effectively have to add an allocator to your buffer, and that allocator has to also be persistable. I actually got this part working in rel while that was still a project worth hacking on [4]. I just ran out of steam and willingness to play the type-tac-toe required to get a safe API out of it. The main problem I ran into there was related to value branding, since you shouldn't be allowed to make references to objects which cross between contiguous buffers.

This project is neat, but it does basically just look like an rkyv-converted `BTreeMap<String, JsonValue>`.

[1]: https://github.com/rkyv/rkyv/blob/main/rkyv/src/collections/... [2]: https://github.com/fastserial/lite3/blob/main/src/lite3.c#L5... [3]: https://lite3.io/design_and_limitations.html#autotoc_md29 (see note at end of section) [4]: https://github.com/rkyv/rel/blob/main/rel_example/src/main.r...

by taintegral

12/19/2025 at 8:27:27 AM

I'm the one who tried to make a low-mem builder for it that would work for merge sort like tasks that do some on-the-fly consolidation, e.g. LSM trees.

> This project is neat, but it does basically just look like an rkyv-converted `BTreeMap<String, JsonValue>`.

Sure sounded like that to me; I typically don't give C developers that use it for things like lite3/tron these days the trust needed to handle their implicit-`unsafe`-block mess with worse syntax/types.

by namibj

12/15/2025 at 2:13:01 PM

It's a bit unfortunate that [1] doesn't seem to mention Fleece[2], since Fleece is an already existing, solid format with very similar properties (JSON-compatible data format, uses offsets for sharing, allows for cheap updates, and uses binary search on arrays to achieve O(ln n) lookups in large tables or arrays, etc.)

[1]: https://lite3.io/design_and_limitations.html#autotoc_md31 [2]: https://github.com/couchbase/fleece/

by c-cube

12/13/2025 at 1:57:38 AM

This is pretty cool.

How does Lite^3 compare to PG's JSONB? PG's JSONB is also a serialized, indexed data structure. One of the key things about JSONB is that for arrays (and so objects) it encodes first their lengths, then the values, but every so many elements (32 is the default IIRC) it encodes an offset, and the reason for this design is that when they encoded offsets only the result did not compress well (and if you think about it it will be obvious why). The price they pay for this design is that finding the offset to the nth element's value requires first finding the offset of the last entry before n that has an offset, then adding all the lengths of the entries in between. This way you get a tunable parameter for trading off speed for compressibility.

EDIT: Ok, I've looked at the format. Some comments:

- Updating in place is cool but you need to clear unused replaced data in case it's sensitive, and then unless you re-encode you will use up more and more space -- once in a while you need a "vacuum". Though vacuuming a Lite^3 document is quite simple: just traverse the data structure and write a new version, and naturally it will be vacuumed.

- On the whole I like Lite^3 quite a bit. Very clever.

- JSONB is also indexed as encoded, but IIUC it's not in-place updateable (unless the new items are the same length as the old) without re-encoding. Though I can imagine a way to tombstone old values and replace them with offsets into appended data, then the result would also need a "vacuum" once in a while.

- I'm curious about compressibility. I suspect not having long runs of pointers (offsets) helps, but still I suspect JSONB is more compressible.

I love the topic of serialization formats, and I've been thinking for some time about ASN.1 compilers (since I maintain one). I've wanted to implement a flatbuffers / JSONB style codec for ASN.1 borrowing ideas from OER. You've given me something to think about! When you have a schema (e.g., an ASN.1 module) you don't really need a B-tree -- the encoded data, if it's encoded in a convenient way, is the B-tree already, but accessing the encoded data by traversal path rather than decoding into nice in-memory structures sure would be a major improvement in codec performance!

by cryptonector

12/13/2025 at 12:15:34 PM

The main difference between Lite³ and JSONB is that JSONB is not a standalone portable format, and therefore is not suitable for external interchange. Its purpose is to be an indexable representation of JSON inside a Postgres database. But sending it as standalone messages to arbitrary consumers does not really make sense. JSONB can only be interpreted in a Postgres context. This is different from for example BSON, which can be read and constructed as a standalone format without Mongo.

Another difference is that JSONB is immutable. Suppose you need to replace one specific value inside an object or array. With JSONB, you would rewrite the entire JSONB document as a result of this, even if it is several megabytes large. If you are performing frequent updates inside JSONB documents, this will cause severe write amplification. Despite the fact that offsets are grouped in chunks of 32, Postgres still rewrites the entire document. This is the case for all current Postgres versions.

On the other hand, Lite³ supports replacing of individual values where ONLY the changed value needs updating. For this to work, you need separate offsets. Postgres makes a tradeoff where they get some benefits in size, but as a result become completely read-only. This is the case in general for most types of compression.

Also JSONB is not suited to storing binary data. The user must use a separate bytea column. Lite³ directly implements a native bytes type.

JSONB was designed to sacrifice mutability in favor of read performance, but despite this, I still expect Lite³ to exceed it at read performance. Of course it is hard to back this up without benchmarks, but there are several reasons:

1) JSONB performs runtime string comparison loops to find keys. Lite³ uses fixed-size hash digests comparisons, where the hashes are computed at compile time.

2) JSONB must do 'walking back' because of the 32-grouped offset scheme.

3) Lite³ has none of the database overhead.

Again, the two formats serve a different purpose, but comparing just the raw byte layouts.

by eliasdejong

12/13/2025 at 6:08:52 PM

Thank you for your thoughtful response.

I agree that Lite³ is almost certainly better than JSONB on every score except compressibility, but when Lite³ is your database format then that doesn't matter (you can always compress large string/blob values if need be). Compressibility might matter for interchange however, but again, if your messages are huge chances are there are compressible strings in them, or if they're not huge then you probably don't care to compress.

by cryptonector

12/14/2025 at 2:45:15 AM

Why not add this approach to postgres as a "JSONL3" type?

It'd be nice to update postgres JSON values without the big write amplification.

by nh2

12/13/2025 at 3:23:20 AM

Rkyv is basically the last thing you mentioned already? It's basically a code gen for deriving serialized structures that can be accessed for read with the exact same API and functionally almost identical (but not quite; in the differences lies much of the special sauce) ABI.

by namibj

12/12/2025 at 9:35:05 PM

Would love a Rust implementation of this.

by the_duke

12/13/2025 at 5:46:03 AM

Sorry, but who are you? Your accounts have no history.

by gritzko

12/12/2025 at 6:07:07 PM

I love SQLite and this is in no way I'm making a point devaluing SQLite, Author's method is excellent approach to get analytical speed out of SQLite. But I am loving DuckDB for similar analytical workloads as it is built for such tasks. DuckDB also reads from single file, like SQLite and DuckDB process large data sets at extreme speeds. I work on my macbook m2 and I have been dealing with about 20 million records and it works fast, very fast.

Loading data into DuckDB is super easy, I was surprised :

SELECT avg(sale_price), count(DISTINCT customer_id) FROM '/my-data-lake/sales/2024/*.json';

and you can also load into a JSON type column and can use postgres type syntax col->>'$.key'

by srameshc

12/12/2025 at 7:03:32 PM

duckdb is super fast for analytic tasks, especially when u use it with visual eda tool like pygwalker. it allows u handles millions of data visuals and eda in seconds.

but i would say, comparing duckdb and sqlite is a little bit unfair, i would still use sqlite to build system in most of cases, but duckdb only for analytic. you can hardly make a smooth deployment if you apps contains duckdb on a lot of platform

by loa_observer

12/12/2025 at 7:57:15 PM

depending on the size and needs of distributed system or application im kind of really excited about postgres + pg_lake. postgres has blown my mind at how well it does concurrent writes at least for the types of things i build/support for my org, the pg_lake extension then adds the ability to.. honestly work like a datalake style analytics engine. it intuitively switches whether or not the transaction goes down the normal query path or it uses duckdb which brings giga-aggregation type queries to massive datasets.

someone should smush sqlite+duckdb together and do that kind of switching depending on query type

by trueno

12/12/2025 at 6:21:07 PM

Whoa. Is that first query building an index of random filesystem json files on the fly?

by mikepurvis

12/12/2025 at 6:57:58 PM

It's not an index, it's just (probably parallel) file reads

That being said, it would be trivial to tweak the above script into two steps, one reading data into a DuckDB database table, and the second one reading from that table.

by NortySpock

12/12/2025 at 6:40:24 PM

can we all agree to never store datasets uncompressed. duckdb supports reading many compression formats

by lame_lexem

12/12/2025 at 7:38:17 PM

How much impact do the various compression formats have on query performance?

by hawk_

12/12/2025 at 2:44:24 PM

I thought this was common practice, generated columns for JSON performance. I've even used this (although it was in Postgres) to maintain foreign key constraints where the key is buried in a JSON column. What we were doing was slightly cursed but it worked perfectly.

by jelder

12/12/2025 at 4:18:16 PM

If you're using postgres, couldn't you just create an index on the field inside the JSONB column directly? What advantage are you getting from extracting it to a separate column?

  CREATE INDEX idx_status_gin
  ON my_table
  USING gin ((data->'status'));

ref: https://www.crunchydata.com/blog/indexing-jsonb-in-postgres

by craftkiller

12/12/2025 at 4:23:26 PM

That works for lookups but not for foreign key constraints.

by jelder

12/12/2025 at 4:33:25 PM

Ah, makes sense. Thanks!

by craftkiller

12/12/2025 at 5:07:06 PM

..and it does not make "certain queries easier" (quote from the article).

by cies

12/12/2025 at 6:47:35 PM

You only need gin if you want to index the entire jsonb. For a specific attribute, you can use the default (btree) which I'm guessing is faster.

by morshu9001

12/12/2025 at 8:11:09 PM

Yes, as far as indices go, GIN indices are very expensive especially on modification. They're worthwhile in cases where you want to do arbitrary querying on JSON data, but you definitely don't want to overuse them.

If you can get away with a regular index on either a generated column or an expression, then you absolutely should.

by a-priori

12/12/2025 at 3:01:20 PM

It works until you realize some of these usages would've been better as individual key/value rows.

For example, if you want to store settings as JSON, you first have to parse it through e.g. Zod, hope that it isn't failing due to schema changes (or write migrations and hope that succeeds).

When a simple key/value row just works fine, and you can even do partial fetches / updates

by ramon156

12/12/2025 at 4:25:47 PM

The necessity of using a JSON column was outside of my control, but Zod etc. are absolutely required, I think, in most projects. I wrote more about that here: https://www.jacobelder.com/2025/01/31/where-shift-left-fails...

by jelder

12/12/2025 at 4:05:18 PM

EAV data models are kinda cursed in their own right, too, though.

by mickeyp

12/12/2025 at 6:51:50 PM

Doesn't sound very cursed, standard normalized relations for things that need it and jsonb for the big bags of attributes you don't care to split apart

by morshu9001

12/12/2025 at 2:53:51 PM

It is. I’d wondered if STORED is necessary and this example uses VIRTUAL.

by sigwinch

12/12/2025 at 3:44:27 PM

This is the typical practice for most index types in SingleStore as well except with the Multi-Value Hash Index which is defined over a JSON or BSON path

by jasonthorsness

12/12/2025 at 1:30:44 PM

I was inspired to write this blog post after reading bambax's comment on a HN post back in 2023: https://news.ycombinator.com/item?id=37082941

by upmostly

12/12/2025 at 5:42:29 PM

Hilariously, I discovered this very technique a couple weeks ago when Claude Code presented it out of the blue as an option with an implemented example when I was trying to find some optimizations for something I'm working on. It turned out to be a really smart and performant choice, one I simply wasn't aware of because I hadn't really kept up with new SQLite features the last few years at all.

Lesson learned: even if you know your tools well, periodically go check out updated docs and see what's new, you might be surprised at what you find!

by kevinsync

12/12/2025 at 10:43:47 PM

Rereading TFM can be quite illuminating.

by daotoad

12/12/2025 at 3:52:42 PM

As others mention, you can create indexes directly against the json without projecting in to a computed column... though the computed column has the added benefit of making certain queries easier.

That said, this is pretty much what you have to do with MS-SQL's limited support for JSON before 2025 (v17). Glad I double checked, since I wasn't even aware they had added the JSON type to 2025.

by tracker1

12/12/2025 at 6:10:21 PM

Exclusively using computed columns, and never directly querying the JSON does have the advantage of making it impossible to accidentally write a unindexed query.

by advisedwang

12/13/2025 at 12:59:15 AM

I did hear about it at a local DBA conference but didn't think it was a big deal

by selimthegrim

12/15/2025 at 2:51:26 PM

It's a pretty big deal as without an actual JSON data type queries are really parsing against strings for every action, which is much much slower in practice.

Most of the JSON functions added in iirc MS-SQL 2016 really performed poorly and is a significant reason why denormalized JSON data was used very sparingly... with actual JSON data types (assuming a binary deserialized form of storage), then queries and operations against that underlying data structure can run significantly faster.

I've been pretty critical of it since I tried using it for a few things a few years ago... it still worked well enough for the needs of what it was doing, but I'm glad that it's doing better.

For reference, what it was being used for was to semi-normalize most stored procedures to receive 2 argumenst and return 2. All JSON... the first argument would be the claims portion of the JWT for the service, the second would be a serialized typed request object representing the request to the service and the two results are the natural results to the sproc as well as an error result if an error occurred. This allowed for a very simplified API surface (basically 4 utility methods being used for all API calls), in the project in question it was a requirement for data logic to be inside the database, of which I'm not a fan, but it did work out pretty well for what it was. Other isseus not withstanding.

by tracker1

12/12/2025 at 2:14:52 PM

interesting, but can't you use "Index On Expression" <https://sqlite.org/expridx.html>?

i.e. something like this: CREATE INDEX idx_events_type ON events(json_extract(data, '$.type'))?

i guess caveat here is that slight change in json path syntax (can't think of any right now) can cause SQLite to not use this index, while in case of explicitly specified Virtual Generated Columns you're guaranteed to use the index.

by Lex-2008

12/12/2025 at 2:24:08 PM

Yeah, you can use index on expression and views to ensure the expression matches, like https://github.com/fsaintjacques/recordlite . The view + index approach decouples the convenience of having a column for a given expression and the need to materialise the column for performance.

by pkhuong

12/12/2025 at 4:12:02 PM

> slight change in json path syntax (can't think of any right now) can cause SQLite to not use this index

It's pretty fragile...

    --  Just changing the quoting
    select * from events where json_extract(data, "$.type") = 'click';

    -- Changing the syntax
    select * from events where data -> '$.type' = 'click';

Basically anything that alters the text of an expression within the where clause

by fny

12/12/2025 at 6:18:56 PM

TIL. Are MySQL and Postgres this fragile too?

by johnmaguire

12/12/2025 at 2:40:05 PM

Yes, that’s the simpler and faster solution.

You need to ensure your queries match your index, but when isn’t that true :)

by paulddraper

12/12/2025 at 4:50:35 PM

> but when isn’t that true

When you write another query against that index a few weeks later and forget about the caveat, that slight change in where clause will ignore that index.

by 0x457

12/12/2025 at 4:03:11 PM

From the linked page:

> The ability to index expressions was added to SQLite with version 3.9.0 (2015-10-14).

So this is a relatively new addition to SQLite.

by WilcoKruijer

12/12/2025 at 4:12:30 PM

I'm not sure 2015 counts as new, but that's same release that first introduced the JSON extension. There isn't a version of SQLite with JSON expressions but without indexes on expressions. Also, the JSON extension wasn't enabled by default until 2022, so most people using SQLite with JSON have got a version much newer than 2015.

by debugnik

12/12/2025 at 4:22:09 PM

i initially misread "2015" as "2025", too... But no, it was part of SQLite for ten years already!

by Lex-2008

12/13/2025 at 2:14:11 AM

Opening an article on HN, seeing one of my comments quoted at the top, and then finding out the whole article is about that one comment: that's a first!

> So, thanks bambax!

You're most welcome! And yes, SQLite is awesome!!

by bambax

12/13/2025 at 3:06:52 AM

This is the comment that inspired tfa: https://news.ycombinator.com/item?id=37083561

by kristianp

12/12/2025 at 3:12:39 PM

I wish devs would normalize their data rather than shove everything into a JSON(B) column, especially when there is a consistent schema across records.

It's much harder to setup proper indexes, enforce constraints, and adds overhead every time you actually want to use the data.

by ellisv

12/12/2025 at 3:29:28 PM

JSON columns shine when

* The data does not map well to database tables, e.g. when it's tree structures (of course that could be represented as many table rows too, but it's complicated and may be slower when you always need to operate on the whole tree anyway)

* your programming language has better types and programming facilities than SQL offers; for example in our Haskell+TypeScript code base, we can conveniently serialise large nested data structures with 100s of types into JSON, without having to think about how to represent those trees as tables.

by nh2

12/12/2025 at 5:10:48 PM

You do need some fancy in-house way to migrate old JSONs to new JSON in case you want to evolve the (implicit) JSON schema.

I find this one of the hardest part of using JSON, and the main reason why I rather put it in proper columns. Once I go JSON I needs a fair bit of code to deal with migrartions (either doing them during migrations; or some way to do them at read/write time).

by cies

12/12/2025 at 7:41:53 PM

Since OP is using Haskell, the actual code most likely won’t really touch the JSON type, but the actual domain type. This makes migrations super easy to write. Of course they could have written a fancy in-house way to do that, or just use the safe-copy library which solves this problem and it has been around for almost two decades. In particular it solves the “nested version control” problem with data structures containing other data structures but with varying versions.

by kccqzy

12/12/2025 at 6:05:17 PM

Yes, that's what we do: Migrations with proper sum types and exhaustiveness checking.

by nh2

12/12/2025 at 3:47:30 PM

I find that JSON(B) works best when you have a collection of data with different or variant concrete types of data that aren't 1:1 matches. Ex: the actual transaction result if you have different payment processors (paypal, amazon, google, apple-pay, etc)... you don't necessarily want/care about having N different tables for a clean mapping (along with the overhead of a join) to pull the transaction details in the original format(s).

Another example is a classifieds website, where your extra details for a Dress are going to be quite a bit different than the details for a Car or Watch. But, again, you don't necessarily want to inflate the table structure for a fully normalized flow.

If you're using a concretely typed service language it can help. C# does a decent job here. But even then, mixing in Zod with Hono and OpenAPI isn't exactly difficult on the JS/TS front.

by tracker1

12/12/2025 at 9:54:08 PM

Yeah document formats (jsonb) are excellent for apps etc that interface with the messy real world. ecommerce, gvt systems etc, anything involving forms, payments etc

tryna map everything in a relational way etc - you're in a world of pain

by dzonga

12/12/2025 at 3:22:25 PM

For very simple JSON data whose schema never changes, I agree.

But the more complex it is, the more complex the relational representation becomes. JSON responses from some API's could easily require 8 new tables to store the data in, with lots of arbitrary new primary keys and lots of foreign key constraints, your queries will be full of JOIN's that need proper indexing set up...

Oftentimes it's just not worth it, especially if your queries are relatively simple, but you still need to store the full JSON in case you need the data in the future.

Obviously storing JSON in a relational database feels a bit like a Frankenstein monster. But at the end of the day, it's really just about what's simplest to maintain and provides the necessary performance.

And the whole point of the article is how easy it is to set up indexes on JSON.

by crazygringo

12/12/2025 at 3:53:46 PM

When a data tree is tightly coupled (like a complex sample of nested data with some arrays from a sensor) and the entire tree is treated like a single thing by writes, the JSON column just keeps things easier. Reads can be accelerated with indexes as demonstrated here.

by jasonthorsness

12/12/2025 at 4:00:19 PM

I fully agree that's wrong (can't imagine the overhead of some larger tables I have if that had happened), that said, often people want weird customizations in medium-sized tables that would set one on a path to having annoying 100 column tables if we couldn't express customizations in a "simple" JSON column (that is more or less polymorphic).

Typical example is a price-setting product I work on.. there's price ranges that are universal (and DB columns reflect that part) but they all have weird custom requests for pricing like rebates on the 3rd weekend after X-mas (but only if the customer is related to Uncle Rudolph who picks his nose).

by whizzter

12/12/2025 at 5:23:55 PM

But if you have to model those custom pricing structures anyway, the question what you gain by not reflecting them in the database schema.

There's no reason to put all those extra fields in the same table that contains the universal pricing information.

by fauigerzigerk

12/15/2025 at 9:16:53 AM

A lot of unnecessary complexity/overhead for a minor seldomly touched part of a much larger already complex system?

I'll give a comparison.

JSON

- We have some frontend logic/view (that can be feature-flagged per customer) to manage updating the data that's otherwise mostly tagging along as a dumb "blob" (auto-expanded to regular a part of the JSON objects maps/arrays at the API boundary making frontend work easier, objects on the frontend, "blobs" on the backend/db)

- Inspecting specfic cases (most of the time it's just null data) is just copying out and formatting the special data.

- If push comes to shows, all modern databases support JSON queries so you can pick out specifics IF needed (has happened once or twice with larger customers over the years).

- We read and apply the rules when calculating prices with a "plugin system"

DB Schema (extra tables)

- Now you have to wade through lots of customer-specific tables just to find the tables that takes most of the work-time (customer specifics are seldomly what needs work once setup). We already have some older customer-specific stuff from the early days (I'm happy that it's not happened much lately).

- Those _very_ few times you actually need to inspect the specific data by query you might win on this (but as mentioned above, JSON queries has always solved it).

- Loading the universal info now needs to query X extra tables (even when 90%-95% of the data has no special cases).

- Adding new operations on prices like copying,etc now needs to have logic for each piece of customer specific table to properly make it tag along.

- "properly" modelled this reaches the API layer as well

- Frontend specialization is still needed

- Calculating prices still needs it's customization.

I don't really see how my life would have been better for managing all extra side-effects of bending the code to suit these weird customer requests (some that aren't customers anymore) when 90-95% of the time it isn't used and seldomly touched upon with mature customers.

I do believe in the rule of 3, if the same thing pops up three times I do consider if that needs to be graduated to more "systematic" code, so often when you abstract after seeing something even 2 times it never appears again leaving you with some abstraction to maintain.

JSON columns, like entity-attribute-value tables or goto statements all have real downsides and shouldn't be plonked in without a reason, but hell if I'd have to work with overly complex schemas/models because people start putting special cases into core pieces of code just because they heard that a technique was bad.

by whizzter

12/12/2025 at 3:15:18 PM

Normalisation brings its own overhead though.

by konart

12/12/2025 at 6:31:41 PM

If you replace JSON with XML in this model it is exactly what the "document store" databases from the 90s and 00s were doing -- parsing at insert and update time, then touching only indexes at query time. It is indeed cool that sqlite does this out of the box.

by verytrivial

12/12/2025 at 3:12:28 PM

I was looking for a way to index a JSON column that contains a JSON array, like a list of tags. AFAIK this method won't work for that; you'll either need to use FTS or a separate "tag" table that you index.

by AlexErrant

12/12/2025 at 3:32:36 PM

I would want that too. It's possible in MySQL: https://dev.mysql.com/doc/refman/8.4/en/create-index.html#cr...

by garaetjjte

12/12/2025 at 5:33:42 PM

Yeah, SQLite doesn't have any true array datatype. I think you could probably do it with a virtual table, but that would be adding a native extension, and it would have to pack its own index.

by MyOutfitIsVague

12/12/2025 at 5:22:38 PM

You can use triggers to keep the tag table synchronized automatically.

by rini17

12/12/2025 at 2:43:42 PM

Tiny bug report: I couldn't edit text in those SQL editor widgets from my iPhone, and I couldn't scroll them to see text that extended past the width of the page either.

by simonw

12/12/2025 at 2:54:38 PM

The examples also needed a “drop table if exists” so they could be run more than once without errors.

by hamburglar

12/12/2025 at 3:05:39 PM

Great catch, I'll add that now!

by upmostly

12/12/2025 at 3:02:33 PM

Thanks Simon! Looking into that now. Big fan. Hope you enjoyed the post.

Edit: This should now be fixed for you.

by upmostly

12/12/2025 at 8:22:43 PM

The fact the DB is portable is amazing, I use it for all my projects now but didn't know about this JSON feature

by rrmdp

12/12/2025 at 4:57:06 PM

It says full speed, but no benchmarks were performed to verify if performance was really equivalent.

by Seattle3503

12/12/2025 at 3:53:45 PM

I've been coding a lot of small apps recently, and going from local JSON file storage to SQLite has been a very natural path of progression, as data's order of magnitude ramps up. A fully performant database which still feels as simple as opening and reading from a plain JSON file. The trick you describe in the article is actually an unexpected performance buffer that'll come in handy when I start hitting next bottleneck :) Thank you

by pawelduda

12/12/2025 at 6:15:22 PM

I love this feature. I've long used json_extract to create dynamic columns with txtai sql: https://neuml.github.io/txtai/embeddings/query/#dynamic-colu...

You can do the same with DuckDB and Postgres too.

by dmezzetti

12/12/2025 at 4:41:05 PM

Would this be a good fit for migrating from mongo --> sqlite? A task I am dreading

by focusgroup0

12/12/2025 at 5:15:23 PM

Just curious, why do you want to migrate from mongo (document database server) to sqlite (relational database library)?

That migration would be making two changes: document-based -> relational, and server -> library.

Have you considered migrating to Postgres instead? By using another DB server you won't need to change your application as much.

by zffr

12/12/2025 at 7:20:12 PM

Thanks for the feedback. The document model in mongo was slopped together by a junior engineer, so perhaps an unorthodox approach. It is basically flat, and already used in a pseudo-relational manner via in-app join to the existing sqlite store. This blog post inspired me to think, what if we just chucked all the json from mongo into sqlite and used the generated indices? Then we can gradually "strangler fig" endpoint by endpoint

by focusgroup0

12/13/2025 at 8:55:22 AM

This sounds roughly on-track, but I agree with GP; Postgres would probably be better (also has great JSON(B) support).

by rglynn

12/12/2025 at 2:24:39 PM

Very cool article. To really drill it home, I would have loved to see how the query plan changes. It _looks_ like it should Just Work(tm) but my brain refuses to believe that it's able to use those new indexes so flawlessly

by mcluck

12/12/2025 at 3:15:44 PM

In the 2nd section you're using a CREATE TABLE plus three separate ALTER TABLE calls to add the virtual columns. In the 3rd section you're using a single CREATE TABLE with the virtual columns included from the get go.

Why?

by meindnoch

12/12/2025 at 3:26:37 PM

I think the intent is to separate the virtual column creation out when it’s introduced in order to highlight that it’s a very lightweight operation. When moving onto the 3rd example, the existence of the virtual columns is just a given.

by hamburglar

12/12/2025 at 5:42:06 PM

Depending on the amount of inserts, it might be more efficient to create all the indexes in one go. I think this is certainly true for normal columns.

But I suspect with JSON the overhead of parsing it each time might make it more efficient to update all the indices with every insert.

Then again, it's probably quicker still to insert the raw SQL into a temporary table in memory and then insert all of the new rows into the indexed table as a single query.

by ralferoo

12/12/2025 at 3:34:12 PM

In 2 they show how to add virtual columns to an existing table, in 3 how to add indexes to existing virtual columns so they are pre-cooked. Like a cooking show.

by hiccuphippo

12/12/2025 at 3:37:42 PM

>In 2 they show how to add virtual columns to an existing table

No, in section 2 the table is created afresh. All 3 sections start with a CREATE TABLE.

by meindnoch

12/12/2025 at 3:49:54 PM

Yes, it seems each section has its own independent database so you have to create everything on each of them.

by hiccuphippo

12/12/2025 at 3:37:25 PM

Literally exactly as I meant it. I watch a lot of cooking shows, too, so this analogy holds up.

by upmostly

12/12/2025 at 3:25:58 PM

Generated columns are pretty great, but what I would really love is a Postgres-style gin index, which dramatically speeds up json queries for unanticipated keys.

by moregrist

12/12/2025 at 7:47:54 PM

Great article with clear instructions - could be quite useful if I need to do stuff with storing JSON in SQLite in the future.

by oars

12/13/2025 at 12:57:30 AM

I love jsonb support in sqlite.

Particularly with drizzle, it means I can use sqlite on device with expo-sqlite, and store our data format in a single field, with very little syntax, and the schema and queries all become fully type safe.

Also being able to use the same light orm abstraction server side with bun:sqlite is huge.

by zackify

12/12/2025 at 6:38:16 PM

I've been using this trick for a while, and it actually got me to do quite a bit without an ORM (just hacking a sane models.py with a few stable wrappers and calling it a day)

by rcarmo

12/12/2025 at 6:55:56 PM

json columns pretty much obviated the need for ORMs. It used to be that you'd sometimes have a deep nested thing you really only ever query all at once rather than in pieces, so you'd use an ORM to automate that, but now you can just shove it into json. And then use regular SQL for the relations you actually care about.

by morshu9001

12/12/2025 at 2:58:48 PM

For smaller datasets (100s of thousands of rows) I don’t see why you wouldn’t just use json columns with generated column/index where needed

by bushbaba

12/12/2025 at 4:40:45 PM

Regardless of the number of rows, it doesn't really matter, there are useful cases for where you might be consuming json directly, so instead of parsing it out into a schema for your database, why not just keep it raw and utilize the tools of the database.

It's a feature, not a replacement.

by bilekas

12/12/2025 at 2:40:16 PM

IIRC, Vertica had/has a similar feature.

by mring33621

12/12/2025 at 6:04:31 PM

It's been around for quite while, but DB people hate to explain where they got an idea. For all I know Vertica got it from somewhere else; I think postgres got jsonb around the same time.

by kwillets

12/12/2025 at 3:13:41 PM

Now there’s a name I haven’t heard in 10 years. (I’m only tenuously connected to the kinds of teams that use/would have used that, so it doesn’t mean much.)

by xp84

12/12/2025 at 3:18:20 PM

What a neat trick, I love SQLite as well.

by N_Lens

12/12/2025 at 3:06:24 PM

My understanding is Snowflake works kinda like that behind the scenes right?

by baq

12/12/2025 at 6:08:32 PM

Your website looks like supermemory.ai , BTW its pretty cool

by javantanna

12/12/2025 at 3:52:25 PM

MongoDB is dead, long live MongoDB

by pipe01

12/12/2025 at 6:31:01 PM

Can I do this with pocket base?

by stacktraceyo

12/12/2025 at 7:12:41 PM

LOL what are the odds, I posted in `Show HN` about Marmot today https://github.com/maxpert/marmot/releases/tag/v2.2.0 and in my head I was thinking exact same thing for supporting MySQL's JSON datatype. At some level I am starting to feel, I might as well be able to expose a full MongoDB compatible protocol that let's you talk to tables as collections, solving this problem once it for all!

But this technique I guess is very common now.

by maxpert

12/12/2025 at 5:29:09 PM

Dude what? This is incredible knowledge. I had been fearing this exact problem for so long, but there is an elegant out of the box solution. Thank you!!

by joshtbradley

12/12/2025 at 4:53:43 PM

> We've got an embedded SQLite-in-the-browser component on our blog

What?

by groundzeros2015

12/12/2025 at 6:56:19 PM

Probably using https://sqlite.org/wasm/doc/trunk/index.md

by qwertox