3/1/2026 at 1:19:35 AM
> But like humans — and unlike computer programs — they do not produce the exact same results every time they are used. This is fundamental to the way that LLMs operate: based on the "weights" derived from their training data, they calculate the likelihood of possible next words to output, then randomly select one (in proportion to its likelihood).This is emphatically not fundamental to LLMs! Yes, the next token is selected randomly; but "randomly" could mean "chosen using an RNG with a fixed seed." Indeed, many APIs used to support a "temperature" parameter that, when set to 0, would result in fully deterministic output. These parameters were slowly removed or made non-functional, though, and the reason has never been entirely clear to me. My current guess is that it is some combination of A) 99% of users don't care, B) perfect determinism would require not just a seeded RNG, but also fixing a bunch of data races that are currently benign, and C) deterministic output might be exploitable in undesirable ways, or lead to bad PR somehow.
by nemo1618
3/1/2026 at 1:32:18 AM
Deterministic output is incompatible with batching, which in turn is critical to high utilization on GPUs, which in turn is necessary to keep costs low.by pavpanchekha
3/1/2026 at 8:42:00 AM
Batching doesn't mean the computation suddenly becomes non-deterministic. Ideally, it just means you perform the same computation on multiple token streams in the batch simultaneously, without the values interacting with each other. Vectorization, basically.Batching leads to cross-contamination in practice because of things like MoE load-balancing within the batch, or supporting different batch sizes with different kernels that have different numerical behavior. But a careful implementation could avoid such issues while still benefiting from the higher efficiency of batching.
by yorwba
3/1/2026 at 5:18:18 AM
> This is emphatically not fundamental to LLMs! Yes, the next token is selected randomly; but "randomly" could mean "chosen using an RNG with a fixed seed."This. Thanks for saying that, because now I don't need to read the article, since if the author doesn't even get that, I'm not interested in the rest.
by valenterry
3/1/2026 at 3:37:46 AM
LLMs are, fundamentally, compressed lookup tables that map input -> input + next token. Or, If you like, input -> input + list of possible next tokens with probabilities.by jrmg
3/1/2026 at 4:05:16 AM
The temperature parameters largely went away when we moved towards reasoning models, which output lots of reasoning tokens before you get to the actual output tokens. I don’t know if it was found that reasoning works better with a higher temperature, or that having separate temperatures for reasoning vs. output wasn’t practical, but that’s my observation of the timing, anyway. And to the other commenter’s point, even a temperature of 0 is not deterministic if the batches are not invariant, which they’re not in production workloads.by willj