3/14/2026 at 8:26:43 PM
My experience does not match theirs when compressing text and code:> bzip might be suboptimal as a general-purpose compression format, but it’s great for text and code. One might even say the b in bzip stands for “best”.
I've just checked again with a 1GB SQL file. `bzip2 -9` shrinks it to 83MB. `zstd -19 --long` to 52MB.
Others have compressed the Linux kernel and found that bzip2's is about 15% larger than zstd's.
by idoubtit
3/15/2026 at 9:04:20 AM
What you are seeing here is probably the effect of window size. BZip has to perform the BWT strictly block-wise and is quite memory-hungry, so `bzip2 -9` uses a window size of 900KB, if I recall correctly. Dictionary-based algorithms are more flexible in this regard, and can gain a substantial advantage on very large and repetitive files. The article kind of forgets to mention this. Not that BZip isn't remarkably efficient for its simplicity, but it's not without limitations.by mppm
3/15/2026 at 2:40:53 AM
I suspect the reason for the difference here may be specific use case and the implications there on the size of the files? The author's use case is Lua files to run in Minecraft, and I strongly suspect their example file at 327KB is very much closer to "typical" for that use case than a 1GB SQL file.It wouldn't surprise me at all that "more modern" compression techniques work better on larger files. It also wouldn't surprise me too much if there was no such thing as a 1GB file when bzip was originally written, according to Wikipedia bzip2 is almost 30 years old "Initial releases 18 July 1996". And there are mentions of the preceding bzip (without the 2) which must have been even earlier than that. In the mid/late 90s I was flying round the world trips with a dozen or so 380 or 500MB hard drives in my luggage to screw into our colo boxen in Singapore London and San Francisco (because out office only has 56k adsl internet).
by bigiain
3/15/2026 at 11:59:51 AM
For large files, it is frequent to obtain much higher compression ratios when using a preprocessing method, e.g. by using lrzip (which invokes internal or external standard compressors after preprocessing the input to find long-range similarities).For instance, "lrzip -b", which uses bzip2 for compression, typically achieves much higher compression ratios on big files than using either xz or zstd alone. Of course, you can also use lrzip with xz or zstd, with various parameters, but among the many existing possibilities you must find an optimum compromise between compression ratio and compression/decompression times.
by adrian_b
3/15/2026 at 4:31:02 AM
> Others have compressed the Linux kernel and found that bzip2's is about 15% larger than zstd'sI compressed kernel 6.19.8 with zstd -19 --long and bzip3 (default settings). The latter compressed better and was about 8x faster.
by ac29
3/14/2026 at 8:55:41 PM
bzip is old and slow.It was long surpassed by lzma and zstd.
But back in roughly the 00s, it was the best standard for compression, because the competition was DEFLATE/gzip.
by cogman10
3/14/2026 at 9:14:59 PM
Also potentially relevant: in the 00s, the performance gap between gzip and bzip2 wasn't quite as wide - gzip has benefited far more from modern CPU optimizations - and slow networks / small disks made a higher compression ratio more valuable.by duskwuff
3/14/2026 at 9:14:59 PM
Even then, there were better options in the Windows world (RAR/ACE/etc.). Also, bzip2 was considered slow even when it was new.by yyyk
3/14/2026 at 10:04:51 PM
RAR/ACE/etc used continuous compression - all files were concatenated and compressed as if they were one single large file. Much like what is done with .tar.bz. Bzip on Windows did not do that, there was no equivalent of .tar.bz2 on Windows.You can bzip2 -9 files in some source code directory and tar these .bz2 files. This would be more or less equivalent to creating ZIP archive with BWT compression method. Then you can compare result with tar-ing the same source directory and bzip2 -9 the resulting .tar.
Then you can compare.
The continuous mode in RAR was something back then, exactly because RAR had long LZ77 window and compressed files as continuous stream.
by thesz
3/14/2026 at 10:38:26 PM
>continuous compression'Solid compression' (as WinRAR calls it) is still optional with RAR. I recall the default is 'off'. At the time, that mode was still pretty good compared to bzip2.
by yyyk
3/15/2026 at 2:07:45 AM
is SQL file text or code?by eviks
3/14/2026 at 9:49:51 PM
And here i got best compression out of xz for SQL.by 8n4vidtmkvmk