3/19/2026 at 2:42:37 PM
Hmm, that's interesting. The code as written only has one branch, the if statement (well, two, the while loop exit clause as well). My mental model of the branch predictor was that for each branch, the CPU maintained some internal state like "probably taken/not taken" or "indeterminate", and it "learned" by executing the branch many times.But that's clearly not right, because apparently the specific data it's branching off matters too? Like, "test memory location X, and branch at location Y", and it remembers both the specific memory location and which specific branch branches off of it? That's really impressive, I didn't think branch predictors worked like that.
Or does it learn the exact pattern? "After the pattern ...0101101011000 (each 0/1 representing the branch not taken/taken), it's probably 1 next time"?
by OskarS
3/19/2026 at 3:34:48 PM
Your mental model is close. Predictors generally work by having some sort of table of predictions and indexing into that table (usually using some sort of hashing) to obtain the predictions.The simplest thing to do is use the address of the branch instruction as the index into the table. That way, each branch instruction maps onto a (not necessarily unique) entry in the table. Those entries will usually be a two-bit saturating counter that predicts either taken, not taken, or unknown.
But you can add additional information to the key. For example, a gselect predictor maintains a shift register with the outcome of the last M branches. Then it combines that shift register along with the address of the branch instruction to index into the table: https://people.cs.pitt.edu/~childers/CS2410/slides/lect-bran... (page 9). That means that the same branch instruction will map to multiple entries of the table, depending on the pattern of branches in the shift register. So you can get different predictions for the same branch depending on what else has happened.
That, for example, let’s you predict small-iteration loops. Say you have a loop inside a loop, where the inner loop iterates 4 times. So you’ll have a taken branch (back to the loop header) three times but then a not-taken branch on the fourth. If you track that in the branch history shift register, you might get something like this (with 1s being taken branches):
11101110
If you use this to index into a large enough branch table, the table entries corresponding to the shift register ending in “0111” will have a prediction that the branch will be not taken (i.e. the next outcome will be a 0) while the table entries corresponding to the shift register ending in say “1110” will have a prediction that the next branch will be taken.
So the basic principle of having a big table of branch predictions can be extended in many ways by using various information to index into the table.
by rayiner
3/20/2026 at 10:23:54 AM
Yeah, the "two-bit saturating counter" thing is pretty much exactly how I thought it worked (which would be terrible for the example in the article), but I had no idea about the fact that it also kept track of the branch history thing, and how that maps to different branch predictor entries. Thanks for the link, that's really fascinating!by OskarS
3/20/2026 at 4:55:23 AM
It seems like that would struggle with detecting how many layers of branching to pay attention to. Imagine the two nested loops surrounded by a randomized one. Wouldn't that implementation keep hitting patterns it hadn't seen before?Obviously that must be a solved problem; I'd be curious to know what the solution is.
by fc417fc802
3/20/2026 at 9:28:12 AM
might be but what real code does that ?by PunchyHamster
3/20/2026 at 11:36:45 AM
It even gets much more sophisticated than that. Even the first Ryzen had something perceptron-based (yes, neuronal network!) and there are several predictors + logic to pick the best one for a branch.by ahartmetz
3/19/2026 at 5:14:31 PM
Check out [1]: it has the most thorough description of branch prediction I've ever seen (chapter 3), across a lot of historical and current CPUs. It is mostly empirical, so you do have to take it with a grain of salt sometimes (the author acknowledges this).Supposedly the branch prediction on modern AMD CPUs is far more sophisticated, based on [2] (a citation pulled from [1]).
by jcalvinowens
3/20/2026 at 5:06:30 AM
> My mental model of the branch predictor was that for each branch, the CPU maintained some internal state like "probably taken/not taken" or "indeterminate", and it "learned" by executing the branch many times.I always figured the algorithm was much simpler, it would just use the same branch as last execution — should work fairly well.
Didn’t realize it used the input value as well, which to me makes no sense — the whole point is to avoid having to inspect the value. This article raises more questions than answers, I’m intrigued now.
by stingraycharles
3/20/2026 at 4:29:34 PM
It does not and cannot use the input value. The branch predictor runs like a dozen cycles before execution, it generally cannot possibly see values in registers. It also runs like 3-4 cycles before decode, so it cannot even use any bits from the instruction that is being executed.⁰ Branch predictors strictly use branch history, but this is more complex than just looking at the probability of branching on a single branch, there are things like tables that maintain all branches over the past tens of thousands of cycles, and try to cover common patterns.0: This is why the first prediction is always "don't branch", because the first time executing code the predictor has literally no information at all. Every now and then people ask for hint bits on branches, but, er, how are you planning to do that when the instruction with the branch hasn't arrived from L1 when the prediction is due?
by Tuna-Fish
3/20/2026 at 12:51:27 PM
> I always figured the algorithm was much simpler, it would just use the same branch as last execution — should work fairly well.Sure, that would work significantly better than no predictor at all. But you'd agree that a better predictor would work better, right? The missing detail might be how expensive mispredicted branches are compared to other costs. If you can go from 50% accuracy to 90% accuracy, it wouldn't be surprising to more than double your performance.
> Didn’t realize it used the input value as well, which to me makes no sense — the whole point is to avoid having to inspect the value.
It doesn't, and can't for the reasons you hint at. The reason branch prediction is necessary is that the value often isn't available yet when the branch is taken. Was there something in the article that implied the opposite?
--
I wonder if Daniel's tricksy approach using a random number generator to simulate a complex pattern is misleading people here.
One of the main benefits of branch prediction is predicting the end of a loop, particularly, a loop within a loop. In assembly, a loop is just a comparison at the end and a branch back to the beginning. Assume you had a loop that always executes 8 times, or some other small fixed value. Also assume there is some reason you can't unroll that loop, and that loop is inside another loop that executes millions of times. It's a real boost to performance if you can consistently predict the end of the inner loop.
If you predicted just on the last time the loop closing branch was taken, you'd always miss the ending. But if you can remember a pattern that is longer than 8, you can always get it right. This is obviously valuable. The bigger question is how much more valuable it is to predict a loop (where "loop" might actually be a complex execution pattern across multiple branches) that is thousands long rather than just 8. But quantifying how long this pattern can be on different processors is part of the groundwork for analyzing this.
by nkurz
3/20/2026 at 11:24:09 AM
Agreed. I wonder if this silicon is designed for this benchmark and if not how useful is it with real code.I would be surprised if this silicon area could not be better utilized for something else
by pyrolistical
3/20/2026 at 4:34:18 PM
No, branch predictors are really important and even small improvements in them are extremely valuable on real loads. Improving branch prediction is both a power and a performance optimization.by Tuna-Fish
3/20/2026 at 5:45:54 PM
relevant video: https://youtu.be/nczJ58WvtYo How Branch Prediction Works in CPUs - Matt Godbolt with Computerphileby throawayonthe
3/20/2026 at 8:43:52 AM
The idea here is about maintaining a "path history"!When looking up a register that tracks the "local" history of outcomes for a particular branch, you want to have a hash function that captures enough context to distinguish the different situations where that branch might be encountered.
Apart from folding a long "global history" of recent outcomes and mixing in the current program counter, I think many modern machines also mix in the target addresses of recently-taken branches.
by eigenform
3/19/2026 at 3:06:20 PM
There are many branch prediction algorithms out there. They range from fun architecture papers that try to use machine learning to static predictors that don’t even adapt to the prior outcomes at all.by LPisGood
3/19/2026 at 3:04:56 PM
Typical branch predictors can both learns patterns (even very long patterns) and use branch history (the probability of a branch being taken depends on the path taken to reach that branch). They don't normally look at data other than branch addresses (and targets for indirect branches).by gpderetta
3/19/2026 at 3:12:34 PM
They can't. The data that would be needed isn't available at the time the prediction is made.by jeffbee
3/19/2026 at 3:21:58 PM
Yeah, otherwise you wouldn't need to predict anything.by 1718627440