alt.hn

6/16/2026 at 4:21:04 PM

Can gzip be a language model?

https://nathan.rs/posts/gzip-lm/

by nathan-barry

6/16/2026 at 4:30:19 PM

LLMs are very good at lossless compression via arithmetic coding. But I didn't know that it was possible to go the reverse direction (do language modeling via a compressor). It's not super great quality, but I'm surprised it worked! Other compression algorithms (like PPMd) use variable n-grams under the hood, and should be much better (although less interesting due to already containing basic language models internally).

by nathan-barry

6/16/2026 at 4:43:23 PM

[flagged]

by chinallm_ai