alt.hn

5/13/2026 at 8:57:06 PM

Content-defined chunking added to Bazel

https://www.buildbuddy.io/blog/content-defined-chunking/

by siggi

5/17/2026 at 9:21:34 AM

I have done extensive research on CDC and it almost never works out because most utilities don't create compressed archives in an "rsyncable" (rsync does CDC) format, I actually saved a lot of storage using restic when I switched my backups of certain things so that files were stored in archives uncompressed, and sorted in a stable order. I know syncthing eventually removed CDC and just went with constant-size block sizes.

Bazel, on the other hand, is completely in control of this, and it makes perfect sense to do this at that point -- and it seems to be a relatively efficient implementation too, really nice to see!

by dotwaffle

5/17/2026 at 2:41:56 AM

This is something I'm very interested in implementing for Docker builds. I've tested out CDC for the final image outputs, it results in smaller outputs but requires tuning between saved space versus request count when pulling. For build cache it might be even more advantageous.

by a_t48

5/17/2026 at 11:56:54 AM

Isn't that rather difficult given the `.tar.gz` layers?

by stabbles

5/17/2026 at 4:12:04 PM

I have a custom pull client/registry/builder that uses a different format, but can output standard OCI if needed.

by a_t48

5/17/2026 at 12:09:42 PM

It also supports .tar but that's probably not very commonly used.

by tracnar

5/17/2026 at 12:18:52 PM

In theory eStargz layers should be amenable to CDC.

by auscompgeek

5/17/2026 at 4:14:57 PM

It feels that way, but eStargz is still only addressable as a single layer, or range of one.

by a_t48

5/16/2026 at 11:09:54 PM

Doesn't this mean that malicious inputs can deliberately cause super tiny or super huge chunks?

by londons_explore

5/17/2026 at 7:59:59 AM

Bazel caches tend to have a size limit.

You need to trust your build execution machine anyway. They have your source code and you will be executing the artifacts that they produce!

by rienbdj

5/16/2026 at 11:53:20 PM

The same is true without CDC, and you can configure a maximum size.

by ramchip