KVarN: Native vLLM backend for KV-cache quantization by Huawei

6/4/2026 at 3:54:56 PM

Better performance than TQ and better quality than FP16?

Am I reading this right??

by throwa356262

6/4/2026 at 5:04:42 PM

It's not better quality: 59.3% vs 59.4% fp16 on AIME 25

by qeternity

6/5/2026 at 12:42:54 AM

0.1% is within margin of error. Depending on the performance boost, it might be worthwhile taking a minuscule quality hit.

by sheepscreek

6/6/2026 at 11:07:00 AM

I think it very much is worth it!

But the point was that quality didn't magically increase.

by qeternity

6/4/2026 at 9:33:54 PM

any divergence (even if the benchmark is better) from full precision is error

by electroglyph

6/5/2026 at 4:32:20 AM

Just pretend that it is the next step update when training. You didn’t train your model to step=inf, I hope?

by 7e

6/4/2026 at 5:02:26 PM

Faster than Fp16, not better quality i guess

by thefox96

6/4/2026 at 4:55:19 PM

[dead]

by pbich

6/4/2026 at 3:53:48 PM

Why this is not a PR for vLLM ?

by v3ss0n

6/4/2026 at 4:00:19 PM

It's the output of a research paper; the authors are not trying to build up vLLM, and they probably have no incentive to do so. You can submit a PR, though! It's easier now while the divergence is low, so don't wait. Since there are six authors, I bet you could get help with the inevitable review chores if you just take the step of creating the PR.

edit: It might not be clear that it is based on vLLM 0.22, which is the current version: https://github.com/huawei-csl/KVarN/commit/d6290e99098d7426d.... All you have to do is create a diff off it; it's fairly straightforward.

by esafak

6/4/2026 at 4:14:14 PM

And with the help of AI, pointing at AI at this paper and saying "making a vLLM PR from this paper" tends to work surprisingly well, even if you need to nudge it a little bit along the way.

by jmalicki

6/4/2026 at 8:09:07 PM

Last I heard, vLLM was backed by a company that has raised $150m in seed funding. I'm sure they've got the resources to port it.

by woadwarrior01

6/4/2026 at 11:05:37 PM

Why this is not a PR for llama.cpp

by electronsoup

6/4/2026 at 5:28:33 PM

it should be easy to do btw

by thefox96

6/5/2026 at 3:16:23 PM

... and it's on llama.cpp that to this guy! https://www.reddit.com/r/LocalLLaMA/comments/1txlhxu/i_imple...

by lukasc-ch

6/5/2026 at 3:18:02 PM

This is awesome! Let's give them some stars: - https://github.com/huawei-csl/KVarN (original repo, vLLM implementation) - https://github.com/Anbeeld/beellama.cpp (llama.cpp implementation + awesome evals)

by lukasc-ch

6/5/2026 at 8:41:19 AM

[dead]

by mikeayles

6/5/2026 at 11:35:27 AM

[flagged]

by sspoisk

6/4/2026 at 5:17:51 PM

[dead]

by shockembopper

6/4/2026 at 9:58:09 PM

yao yao ling xian

by 0xjeffro