2/10/2026 at 3:17:33 PM
Implementing rotate through carry like that was a really bad decision IMO - it's almost never by more than one bit left or right at a time, and this could be done much more efficiently than with the constant-time code which is only faster when the count is > 6.Is the full microcode available anywhere?
by rep_lodsb
2/10/2026 at 3:40:20 PM
I haven't published it yet as there are still some rough edges to clear up, but if you email me (andrew@reenigne.org) I'll send you the current work-in-progress (the same one that nand2mario is working from).by ajenner
2/10/2026 at 3:31:17 PM
Since the shifter is also used for bit tests, the 'most things are a 1-bit shift' might not be the case. Perhaps they did the analysis and it made sense.by kjs3
2/10/2026 at 4:07:02 PM
There are separate opcodes for shift/rotate by 1, by CL, or by an immediate operand. Those are decoded to separate microcode entry points, so they could have at least optimized the "RCL/RCR x,1" case.And the microcode for bit test has to be different anyway.
by rep_lodsb
2/12/2026 at 6:26:49 AM
Except that there are tremendous advantages to constant-time execution, not the least of which is protection from timing security attacks/information leakage (which admittedly were less of a concern back then). Sure you can get the one instruction executed for the <6 case faster, but the transistor budget for that isn't worth it, particularly if you pipeline the execution into stages. It makes optimization far more complex...by cbsmith