4/11/2026
at
6:51:23 PM
Disappointing on the NPU. I have found it's a point where industry wide improvement is necessary. People talk tokens/sec, model sizes, what formats are supported... But I rarely see an objective accuracy comparison. I repeatedly see that AI models are resilient to errors and reduced precision which is what allows the 1 bit quantization and whatnot.But at a certain point I guess it just breaks? And they need an objective "I gave these tokens, I got out those tokens". But I guess that would need an objective gold standard ground truth that's maybe hard to come by.
by Neywiny
4/11/2026
at
10:09:40 PM
just try to find some benchmark top_k, temp, etc parameters for llama.cpp. There's no consistent framing of any of these things. Temp should be effectively 0 so it's atleast deterministic in it's random probabilities.
by cyanydeez
4/12/2026
at
2:58:36 AM
Right. There are countless parameters and seeds and whatnots to tweak. But theoretically if all the inputs are the same the outputs should be within Epsilon of a known good. I wouldn't even mandate temperature or any other parameter be a specific value, just that it's the same. That way you can make sure even the pseudorandom processes are the same, so long as nothing pulls from a hardware rng or something like that. Which seems reasonable for them to do so idk maybe an "insecure rng" mode
by Neywiny