6/5/2026 at 6:14:49 PM
I like the technique described here around distillation to recover from quantization, but I don't understand why we keep performing lossy compression on LLMs then using benchmarks that were nearly saturated before post-training to measure the effects.You could erase the gains from literally half the compute going into some of these recent models and barely make a dent in MMLU-Pro and GPQA-D.
by BoorishBears