alt.hn

4/15/2026 at 12:57:27 PM

Binary Encodings for JSON and Variant

https://jincongho.com/posts/designing-binary-encodings-for-json-and-variant/

by jincongho

4/18/2026 at 4:03:43 PM

At work, I wrote a C++20 data binding library. It works by running visitors over a data model that binds to the application state. My comment comes from a different set of trade-offs driven by memory constraints.

I've implemented a bunch of serialization visitors. For the structured formats, most (JSON, YAML, CBOR with indefinite lengths) use an output iterator and can stream out one character/byte at a time, which is useful when your target is a MCU with 640 KiB of SRAM and you need to reply large REST API responses.

And there's the BSON serializer, which writes to a byte buffer because it uses tag-length-value and I need to backtrack in order to patch in the lengths after serializing the values. This means that the entire document needs to be written upfront before I can do something with it. It also has some annoying quirks, like array indices being strings in base 10.

There are also other trade-offs when dealing with JSON vs. its binary encodings. Strings in JSON may have escape characters that require parsing, if it has them then you can't return a view into the document, you need to allocate a string to hold the decoded value. Whereas in BSON or CBOR (excluding indefinite-length strings) the strings are not escaped and you can return a std::string_view straight from the document (and even a const char* for BSON, as it embeds a NUL character).

Some encodings like CBOR are also more expressive than JSON, allowing for example any value type to be used for map keys and not just strings.

by boricj

4/18/2026 at 10:37:32 PM

Parquet file format writes its metadata including length info after all data, at the footer. It was counterintuitive when I first look at it, but smart thinking about it now. I haven't had to trade off for memory constraints, but being able to stream output is definitely easier!

Interesting point about the difference in escape characters, I stored length and the decoded value so it's ready for string view. But when I need them back as JSON string, I need to encode them again :)

by jincongho

4/18/2026 at 3:35:53 PM

As the author stated, it really depends on what you intend to use it for.

Fast internal scanning isn't free, because now you need pre-indexing, which is more data, and loses the incremental buildability on the encoding end.

Small transfer size and fast (full) decoding is possible with a single binary format, but unfortunately designers keep falling into the trap of adding extra things that make them incompatible with JSON. It's why I wrote https://github.com/kstenerud/bonjson/

by kstenerud

4/19/2026 at 6:56:03 AM

[flagged]

by digitalShield