alt.hn

6/13/2026 at 4:49:42 PM

The adder at the heart of Intel's 8087 floating-point chip

https://www.righto.com/2026/06/intel-8087-adder-reverse-engineered.html

by pwg

6/13/2026 at 5:00:03 PM

Author here for your 8087 questions. I find adders and ALUs interesting because they are key to the performance of a system and every system implements them differently.

by kens

6/13/2026 at 8:06:46 PM

Do you know about how many transistors are needed to implement the adder (or the FPU as a whole)? And how it scales with the width of the numbers (16 bit, 32 bit, etc)?

I've been curious about transistor counts for floating point units for a while, but it's hard to find information about them.

by mitthrowaway2

6/13/2026 at 8:33:50 PM

I count approximately 2014 transistors (including pull-ups) for the 69-bit adder. Each block of four bits takes approximately 117 transistors.

by kens

6/13/2026 at 5:31:31 PM

No immediate questions, but happy to have some great weekend reading. A quick pass through finds one of the best and clearest explainers I've seen. Thanks for this and all the materials you produce.

by sebgan

6/13/2026 at 6:56:23 PM

Any idea how much adder designs changed on modern CPUs compared to back then? I mean there's only so much you can optimize in those, I think...

by Aardwolf

6/13/2026 at 8:22:53 PM

Even by the time of the Pentium, they had moved to much more complicated adders like Kogge-Stone. I wrote about it here: https://www.righto.com/2025/01/pentium-carry-lookahead-rever...

by kens

6/13/2026 at 11:35:17 PM

Do you have anything on those TRW floating point chips that used to titillate junior engineers in trade mag advertisements before that?

by B1FF_PSUVM

6/13/2026 at 9:45:01 PM

There's a surprising amount of optimization possible in them. You can improve the latency of them substantially at the cost of a lot more transistors.

by rcxdude

6/13/2026 at 11:30:07 PM

For example, an adder's total delay depends on a carry chain. If you have N 4-bit slices, the last slice has to wait for the carry to propagate through all N-1 previous slices.

But if you duplicate all your slices, you can have the results for both carry = 0 and carry = 1 inputs. Then just switch which one is correct - total time 1 add plus N-1 switches.

Just for double (and change) the hardware. Cheap.

by B1FF_PSUVM

6/14/2026 at 4:53:05 AM

I believe that every single adder architecture we now use was known by 1980s. The "optimization" is matching the theory to the engineering of the day.

The reason you don't use prefix adders in 1980 is that you can't possibly route them because you don't have enough metal. So instead, you use chunks of Manchester carry chain because the "tapping internal nodes" that everybody cites allows you to route nodes in diffusion and polysilicon instead of having to use metal.

Of course, THAT only works because you have 5V (or more) and can connect lots of transistors in series and still have them work. As your voltage falls you can't connect as many transistors in series, so you switch to architectures that prefer active gates over passthroughs and long chains.

So, as your available metal layers, supply voltage, transistor speed, threshold voltages, capacitive load and power dissipation all shift over the engineering landscape, your "optimization" shifts with it.

by bsder

6/13/2026 at 7:43:47 PM

> take two clock cycles to complete an addition.

How does the clocking work exactly? The circuit is fed A and B and up down up down clock and then the output appears? How does the consumer (circuit) know when to read the result? Is there a "result is ready" flag? How long does the result stay stable? One full clock cycle? So many questions...

by m1333

6/13/2026 at 8:14:11 PM

The adder is not clocked. You can see from the diagrams that there are no clock inputs. The clock cycles comment is more an expression of the length of time that it takes before all of the carry rippling and whatnot settles down.

by JdeBP

6/13/2026 at 8:39:41 PM

In more detail, the microcode engine normally executes one micro-instruction per cycle. For addition, the engine is blocked for one extra cycle to give the result time to percolate through the adder.

There is some complicated timing within a clock cycle with slightly delayed clocks and whatnot, for instance, to precharge the carry lines at the beginning of the operation. The 8087 is mostly synchronous with the clock, but they "cheat" in many places.

by kens

6/14/2026 at 5:23:50 AM

great post … thanks for all the work

personally I would like to see a compare and contrast between the Intel 8087 (built around an full width adder), 287 and the Weitek 1167 (built around a full width mac and barrel shifter)

as you note, all these parts were pushing the transistor limits of their day

PS. and the Inmos T800 had a log shifter … so a compromise between those extremes

by librasteve

6/13/2026 at 8:02:54 PM

It is interesting that over the years people have produced synthesizable RTL HDL for the 8086/8088 and later, with varying degrees of fidelity, but no-one seems to have produced similar for the 8087.

by JdeBP

6/13/2026 at 8:10:20 PM

The ROM used different sized transistors to store two bits per transistor. That's pure analog territory, which most HDLs don't touch.

by colejohnson66

6/13/2026 at 8:21:56 PM

AIUI, the 8087 was essentially at the extreme cutting edge of what was possible to produce with the technology of the time, and even Intel at the time was largely treating it as a likely-to-fail project.

by jcranmer

6/13/2026 at 8:43:55 PM

That's not really an explanation of why the people who have made synthesizable 8086/8086 processors haven't done the same thing for the 8087, because modern FPGAs aren't limited by the cutting edge of 1980 technology. (-:

My educated guess is that primarily simply no-one has needed this, and secondarily it's hard. They're running softwares that can do all of their floating point in software anyway and they just don't need an 8087 on an FPGA. And floating point on an FPGA uses a lot of area, if one is taking the easy route of just emulating the external behaviour rather than the much harder task of emulating the clever microarchitecture that reduces it all to just 1 adder.

by JdeBP

6/13/2026 at 9:49:05 PM

A lot of applications where an embedded x86 core makes sense don't have a huge need for FP maths.

by userbinator

6/13/2026 at 7:44:34 PM

/* It's a bummer that there is addition but no vipition. */

by nine_k

6/13/2026 at 7:53:07 PM

I knew a guy who bred snakes but could never really get much out of his adders.

Turns out what he needed to do was saw up some tree trunks to make rough platforms for them, and they bred like crazy.

Adders can multiply really efficiently with log tables.

by ErroneousBosh

6/13/2026 at 8:13:18 PM

slow clap

by inigyou

6/13/2026 at 10:29:03 PM

(It's an old, old joke.)

by Sharlin

6/13/2026 at 11:12:13 PM

and my slow clap processor made it into this thing

by inigyou

6/13/2026 at 7:26:13 PM

Do you have any insights on how power was delivered to these circuits? Maybe it's done in the metal layers that were dissolved? Also, is it correct that there is no on die capacitance surrounding these circuits?

Thanks for the great article.

by throwaway152321

6/13/2026 at 8:48:20 PM

The 8087 has one metal layer, which makes power distribution more challenging. You want to keep power distribution in the metal, so for the most part the pattern is two interdigitated trees for power and ground. There are a few places where the lines need to cross, which is accomplished with a short polysilicon connection underneath. The two clock lines are also kept in metal whenever possible.

The die photo at the start of the article shows some of the power distribution (the thick white lines around the edge and through the die). I have a close-up shot of the adder's metal layer in the article, showing the thick power and ground metal lines that run next to the adder.

As far as capacitors, there are some capacitors for specific things, but no decoupling capacitors. I think the capacitors are mostly to tweak the timing, if a signal needs to be delayed slightly.

by kens

6/14/2026 at 2:34:56 AM

[dead]

by ryanshrott

6/13/2026 at 6:08:39 PM

[flagged]

by oakinnagbe