4/10/2026 at 1:55:05 PM
In short: Deduplication efforts frustrated by hardlink limits per inode — and a solution compatible with different file systems.by replooda
4/10/2026 at 2:21:42 PM
The real problem is they aren't deduplicating at the filesystem level like sane people do.by UltraSane
4/10/2026 at 3:42:35 PM
From the article:> [W]e shipped an optimization. Detect duplicate files by their content hash, use hardlinks instead of downloading each copy.
by otterley
4/10/2026 at 3:54:16 PM
I meant TRANSPARENT filesystem level dedupe. They are doing it at the application level. filesystem level dedupe makes it impossible to store the same file more than once and doesn't consume hardlinks for the references. It is really awesome.by UltraSane
4/10/2026 at 3:59:33 PM
Filesystem/file level dedupe is for suckers. =DIf the greatest filesystem in the world were a living being, it would be our God. That filesystem, of course, is ZFS.
Handles this correctly:
by mmh0000
4/10/2026 at 4:01:37 PM
I was talking about block level dedupe.by UltraSane
4/10/2026 at 4:08:14 PM
I thought you might be.I just wanted to mention ZFS.
Have I mentioned how great ZFS is yet?
by mmh0000
4/11/2026 at 8:30:26 AM
It's not as good as ed: https://www.gnu.org/fun/jokes/ed-msg.htmlby vmilner
4/10/2026 at 6:19:58 PM
ZFS is great! However, it's too complicated for most Linux server use cases (especially with just one block device attached); it's not the default (root filesystem); and it's not supported for at least one major enterprise Linux distro family.by otterley
4/10/2026 at 8:07:56 PM
File system dedupe is expensive because it requires another hash calculation that cannot be shared with application-level hashing, is a relatively rare OS-fs feature, doesn't play nice with backups (because files will be duplicated), and doesn't scale across boxes.A simpler solution is application-level dedupe that doesn't require fs-specific features. Simple scales and wins. And plays nice with backups.
Hash = sha256 of file, and abs filename = {{aa}}/{{bb}}/{{cc}}/{{d}} where
aa = hash 2 hex most significant digits
bb = hash next 2 hex digits
cc = hash next 2 hex after that
d = remaining hex digits
by burnt-resistor
4/10/2026 at 11:28:54 PM
For ZFS, at least, `zfs send` is the backup solution. And it performs incremental backups with the `-i` argument.by otterley
4/11/2026 at 1:58:01 AM
zfs send is really awesome when combined with dedupe and incrementalby UltraSane
4/10/2026 at 8:23:44 PM
All good backup software should be able to do deduped incremental backups at the block level. I'm used to veeam and commvaultby UltraSane