12/12/2025 at 7:59:44 PM
It is also possible to encode JSON documents directly as a serialized B-tree. Then you can construct iterators on it directly, and query internal fields at indexed speeds. It is still a serialized document (possible to send over a network), though now you don't need to do any parsing, since the document itself is already indexed. It is called the Lite³ format.Disclaimer: I am working on this.
by eliasdejong
12/12/2025 at 9:21:00 PM
This is super cool! I've always liked Rkyv (https://rkyv.org) but it requires Rust which can be a big lift for a small project. I see this supports binary data (`lite3_val_bytes`) which is great!by conradev
12/12/2025 at 9:32:51 PM
Thank you. Having a native bytes type is non-negotiable for any performance intensive application that cannot afford the overhead of base64 encoding. And yes, Rkyv also implements this idea of indexing serialized data. The main differences are:1) Rkyv uses a binary tree vs Lite³ B-tree (B-trees are more cache and space efficient).
2) Rkyv is immutable once serialized. Lite³ allows for arbitrary mutations on serialized data.
3) Rkyv is Rust only. Lite³ is a 9.3 kB C library free of dependencies.
4) Rkyv as a custom binary format is not directly compatible with other formats. Lite³ can be directly converted to/from JSON.
I have not benchmarked Lite³ against Rust libraries, though it would be an interesting experiment.
by eliasdejong
12/12/2025 at 11:54:48 PM
That second point is huge – Rkyv does have limited support for in-place mutation, but it is quite limited!If you added support for running jq natively, that would be very cool. Lite³ brings the B-trees, jq brings the query parser and bytecode, combined, you get SQLite :P
by conradev
12/13/2025 at 12:23:11 AM
Yes, in fact the name Lite³ was chosen because it is lighter than SQLite.I thought about implementing something like jq or JSON query, and this is very possible. It is like sending a mini-database that can be queried at speeds thousands of times faster than any JSON library is able to parse.
One interesting effect of being a zero-copy format is that the 'parsing speed' can exceed the memory bandwidth of the CPU, since to fulfill a query you do not actually need to parse the entire dataset. You only walk the branches of the tree that are actually required.
I've talked to some other people that have also shown interest in this idea. There doesn't really seem to exist a good schemaless single-file format that supports advanced queries. There is only SQLite and maybe HDF5.
by eliasdejong
12/14/2025 at 1:50:49 AM
I would very much love that. I like `jq` itself as a standard, even though I don't know how well it maps. Areas where I'd want to use Lite³:- Microcontrollers. I find myself reaching for https://github.com/siara-cc/sqlite_micro_logger_c/tree/maste... because SQLite is just too big
- Shared memory regions/mapped files. Use it to share state between processes. Could you make mutations across processes/threads lock-free?
- Caching GPU-friendly data (i.e. image cache). I'm not sure if the current API surface/structure is page alignment friendly
by conradev
12/14/2025 at 11:48:01 AM
In general jq maps very well to any hierarchical datastructure. One of the maintainers has made 'fq' which supports BSON, MsgPack, Protobuf, CBOR and even media files png, jpg, mp4 etc.SQLite when compiled for size is 590 kB. But I think a full jq database implementation based on Lite³ would be possible under 100 kB.
Lock-free shared state relies on algorithms that can make clever use of atomic instructions. But you should not have threads write to the same memory regions, because the hardware only allows for 1 core to have a cacheline in a writeable state. If another core attempts a write to the same line, this will immediately invalidate all the other copies. Under high contention the coherency penalty becomes so large that throughput falls through the floor. So basically the algorithms need to do most of the work in separate memory regions, then occasionally coordinate by 'committing' their work via a spinlock or similar.
Lite³ implements some settings for node alignment, but not for user data. It would be possible to create a bytes type with extra alignment guarantees.
by eliasdejong
12/13/2025 at 2:05:47 AM
Building a jq around something like Lite^3 or JSONB is a very appealing thought.by cryptonector
12/13/2025 at 3:18:56 AM
1) when did they downgrade? I've stared for hours at that particular code...2) no you just don't get to move data freely.
3) I don't believe JSON has any place in a system that needs C because it can't handle Rust.
4) JSON can't handle non-tree structures, it's further very limited in expressivity. Rkyv is more of a code gen akin to ASN.1
Happy benchmarking, feel free to use the rkyv benchmark tooling and ensure you have enough link time optimization going on.
by namibj
12/13/2025 at 1:57:38 AM
This is pretty cool.How does Lite^3 compare to PG's JSONB? PG's JSONB is also a serialized, indexed data structure. One of the key things about JSONB is that for arrays (and so objects) it encodes first their lengths, then the values, but every so many elements (32 is the default IIRC) it encodes an offset, and the reason for this design is that when they encoded offsets only the result did not compress well (and if you think about it it will be obvious why). The price they pay for this design is that finding the offset to the nth element's value requires first finding the offset of the last entry before n that has an offset, then adding all the lengths of the entries in between. This way you get a tunable parameter for trading off speed for compressibility.
EDIT: Ok, I've looked at the format. Some comments:
- Updating in place is cool but you need to clear unused replaced data in case it's sensitive, and then unless you re-encode you will use up more and more space -- once in a while you need a "vacuum". Though vacuuming a Lite^3 document is quite simple: just traverse the data structure and write a new version, and naturally it will be vacuumed.
- On the whole I like Lite^3 quite a bit. Very clever.
- JSONB is also indexed as encoded, but IIUC it's not in-place updateable (unless the new items are the same length as the old) without re-encoding. Though I can imagine a way to tombstone old values and replace them with offsets into appended data, then the result would also need a "vacuum" once in a while.
- I'm curious about compressibility. I suspect not having long runs of pointers (offsets) helps, but still I suspect JSONB is more compressible.
I love the topic of serialization formats, and I've been thinking for some time about ASN.1 compilers (since I maintain one). I've wanted to implement a flatbuffers / JSONB style codec for ASN.1 borrowing ideas from OER. You've given me something to think about! When you have a schema (e.g., an ASN.1 module) you don't really need a B-tree -- the encoded data, if it's encoded in a convenient way, is the B-tree already, but accessing the encoded data by traversal path rather than decoding into nice in-memory structures sure would be a major improvement in codec performance!
by cryptonector
12/13/2025 at 12:15:34 PM
The main difference between Lite³ and JSONB is that JSONB is not a standalone portable format, and therefore is not suitable for external interchange. Its purpose is to be an indexable representation of JSON inside a Postgres database. But sending it as standalone messages to arbitrary consumers does not really make sense. JSONB can only be interpreted in a Postgres context. This is different from for example BSON, which can be read and constructed as a standalone format without Mongo.Another difference is that JSONB is immutable. Suppose you need to replace one specific value inside an object or array. With JSONB, you would rewrite the entire JSONB document as a result of this, even if it is several megabytes large. If you are performing frequent updates inside JSONB documents, this will cause severe write amplification. Despite the fact that offsets are grouped in chunks of 32, Postgres still rewrites the entire document. This is the case for all current Postgres versions.
On the other hand, Lite³ supports replacing of individual values where ONLY the changed value needs updating. For this to work, you need separate offsets. Postgres makes a tradeoff where they get some benefits in size, but as a result become completely read-only. This is the case in general for most types of compression.
Also JSONB is not suited to storing binary data. The user must use a separate bytea column. Lite³ directly implements a native bytes type.
JSONB was designed to sacrifice mutability in favor of read performance, but despite this, I still expect Lite³ to exceed it at read performance. Of course it is hard to back this up without benchmarks, but there are several reasons:
1) JSONB performs runtime string comparison loops to find keys. Lite³ uses fixed-size hash digests comparisons, where the hashes are computed at compile time.
2) JSONB must do 'walking back' because of the 32-grouped offset scheme.
3) Lite³ has none of the database overhead.
Again, the two formats serve a different purpose, but comparing just the raw byte layouts.
by eliasdejong
12/14/2025 at 2:45:15 AM
Why not add this approach to postgres as a "JSONL3" type?It'd be nice to update postgres JSON values without the big write amplification.
by nh2
12/13/2025 at 6:08:52 PM
Thank you for your thoughtful response.I agree that Lite³ is almost certainly better than JSONB on every score except compressibility, but when Lite³ is your database format then that doesn't matter (you can always compress large string/blob values if need be). Compressibility might matter for interchange however, but again, if your messages are huge chances are there are compressible strings in them, or if they're not huge then you probably don't care to compress.
by cryptonector
12/13/2025 at 3:23:20 AM
Rkyv is basically the last thing you mentioned already? It's basically a code gen for deriving serialized structures that can be accessed for read with the exact same API and functionally almost identical (but not quite; in the differences lies much of the special sauce) ABI.by namibj
12/12/2025 at 9:35:05 PM
Would love a Rust implementation of this.by the_duke
12/13/2025 at 5:46:03 AM
Sorry, but who are you? Your accounts have no history.by gritzko