alt.hn

5/10/2026 at 10:14:20 AM

Fc, a lossless compressor for floating-point streams

https://github.com/xtellect/fc

by enduku

5/13/2026 at 5:59:53 AM

It splits the input into adaptively-sized blocks (quanta), runs a competition between many specialized codecs on each block, and emits the smallest result.

This is, for lack of a better term, a "metacompressor", but it will be interesting to see which of the choices end up dominating; in my past experiences with metacompression, one algorithm is usually consistently ahead.

by userbinator

5/13/2026 at 6:49:52 AM

I’ve never heard of a metacompressor before, what others exist?

by apodik

5/13/2026 at 10:51:21 AM

I’m regarding that term loosely here- in this case it is 'try several representations/codecs for a block and store the winner.' Similar ideas show up in columnar formats choosing encodings per column/page, OpenZL selectors (asother commenters pointed here), and shuffle/transpose + backend-compressor pipelines. fc’s version is much narrower: a tournament among f64-specific modes per block.

by enduku

5/13/2026 at 8:07:54 AM

I think the idea is that the compressor is "meta" in the sense that it directs compressors as GP mentions by selecting what's actually producing the best results, so it's not just one comrpessor but a series of supported ones plugged in to be used adaptively (controlled at a "meta" level).

Floating point data is a mess to compress, but I think the idea here is to apply different transforms (and perhaps back-end codecs) on data and see if one fits the data so perfectly that you magically get a lot of compression.

Say you have an audio with a sawtooth, it's linear an gradient but if the peaks is "random" values like 1.245 and PI then the mantissa bits of the interpolation range will look fairly "random" to a classic compressor, whilst this compressor can test to see if there are linear gradient spans (or near linear gradient) where it stores the gradient and dumps out the "difference" bits for a regular compressor.

Or 3d coordinates for 3d models (non-stripified), plenty of repeating 8-byte doubles that will be garbage and not help a classic compressor much, building a float aware dictionary and using that would easily bring down the data by quite a few %.

(I don't agree with GP, one method might win out for certain workloads, but the idea here seems to be a pluggable utility that can help a wide range of developers with something "for free").

by whizzter

5/13/2026 at 10:48:21 AM

[dead]

by enduku

5/13/2026 at 11:33:08 AM

A lossy compressor might also be useful for common floating point apps. The simplest compressor ever would just chop off a number of bits from the mantissa.

by childintime

5/13/2026 at 7:46:08 AM

> "fc is a lossless compressor for streams of IEEE-754 64-bit doubles."

The new OpenZL SDDL2 (Simple Data Description Language) supports several different floating-point types. It would be worthwhile to contribute some of the FC project's experience to OpenZL. Now the OpenZL supported types:

  | Type           | Size    |Endian|
  |----------------|---------|-----|
  | `Int8`         | 1 byte  | N/A |
  | `UInt8`        | 1 byte  | N/A |
  | `Int16LE/BE`   | 2 bytes | Yes |
  | `UInt16LE/BE`  | 2 bytes | Yes |
  | `Int32LE/BE`   | 4 bytes | Yes |
  | `UInt32LE/BE`  | 4 bytes | Yes |
  | `Int64LE/BE`   | 8 bytes | Yes |
  | `UInt64LE/BE`  | 8 bytes | Yes |
  | `Float16LE/BE` | 2 bytes | Yes |
  | `Float32LE/BE` | 4 bytes | Yes |
  | `Float64LE/BE` | 8 bytes | Yes |
  | `BFloat16LE/BE`| 2 bytes | Yes |
  | `Bytes(n)`     | n bytes | N/A |
Some links:

- https://github.com/facebook/openzl/releases/tag/v0.2.0

- https://openzl.org/getting-started/introduction/

- https://openzl.org/sddl/sddl2-announcement/

- https://openzl.org/sddl/core-concepts/

by pella

5/13/2026 at 10:46:47 AM

Thanks, this looks super relevant. I think the transferable part is the per-block selectrover predictors, strides, deltas, exponent/mantissa-ish structure, byte transpose, fallback raw/LZ, etc.sddl2 looks like a natural place to try some of that.

by enduku

5/13/2026 at 2:21:00 AM

The question is, how close can OpenLZ come? (This is from the same people who develop zstd, but suitable for structured data in a generic way.)

by loeg

5/13/2026 at 11:04:02 AM

I need to add it to the benchmark. My expectation is that OpenZL should be strong when the enclosing format is known and SDDL can separate typed fields cleanly. Running both on the same f64 arrays will give some information

by enduku

5/13/2026 at 1:07:50 AM

I see you have ALP, but have you tried Chimp128 or Arrow's byte stream split?

by Scaevolus

5/13/2026 at 10:58:16 AM

[dead]

by enduku

5/13/2026 at 10:15:40 AM

The most interesting section - How It Works - could really elaborate on details a bit more.

by abcd_f

5/13/2026 at 10:53:23 AM

Agreed. will work on that :)

by enduku

5/13/2026 at 1:51:30 AM

Another library in this space is pcodec; I'd appreciate a comparison of the two.

by KerrickStaley

5/13/2026 at 10:59:50 AM

Agreed; pcodec is probably one of the most relevant comparisons. I will add pcodec to teh benchmark

by enduku

5/10/2026 at 10:14:20 AM

I built "fc", a C library for compressing streams of 64-bit floating-point values without quantization.

It is not trying to replace zstd or lz4. The idea is narrower: take blocks of doubles, try a set of float-specific predictors/transforms/coders, and emit whichever representation is smallest for that block.

It is aimed at time-series, scientific, simulation, and analytics data where the numbers often have structure: smooth curves, repeated values, fixed increments, periodic signals, predictable deltas, or low-entropy mantissas.

The API is intentionally small: "fc_enc", "fc_dec", a config struct, and a few counters to inspect which modes won. Decode is parallel and meant to be fast; encode spends more CPU searching for a better representation.

Current caveats: x86-64 only for now, tuned for IEEE-754 doubles, research-grade rather than production-hardened.

Repo: https://github.com/xtellect/fc

by enduku

5/13/2026 at 11:02:13 AM

> rather than production-hardened.

Please run it through your preferred AI once or twice with instruction to look for bugs. The version of Fc in the main branch has at least a few memory safety bugs that attacker-controlled inputs could exploit.

I'd link a chat history but the tool I used has that feature blocked for some weird reason, and the locals round these parts don't take kindly to copy-pasted AI content...

by jiggawatts

5/13/2026 at 11:22:51 AM

Thank you. Fuzz safety is definitely on my list. Current focus is to broaden the benchmarks , predictors and preprocessors and see what sticks

by enduku

5/11/2026 at 2:11:50 PM

Does it assume the floats come from photos or sound or something?

by gus_massa

5/12/2026 at 8:47:57 PM

It is intended t obe mainly source agnostic (will try to add custom source predictors too). The idea is to treat input as an ordered stream of doubles and look for numeric structure like repeats, smooth deltas, fixed increments, or low-entropy bits. Target presentlyis scientific/time-series/simulation/analytics data, not photos or sound.

by enduku

5/13/2026 at 4:25:46 AM

isn't sound a time series? I guess it's not usually 64-bit doubles.

by tingletech

5/13/2026 at 11:01:20 AM

[dead]

by enduku

5/13/2026 at 2:29:16 AM

What do you mean by decode is parallel?

by snissn

5/13/2026 at 6:32:28 AM

It splits the input into blocks which are encoded separately, so the decoder can fire up multiple threads to decode multiple blocks in parallel.

https://github.com/xtellect/fc#how-it-works

by magicalhippo