alt.hn

6/9/2026 at 3:39:32 AM

Flat Datacenter Networks at Scale at Amazon

https://perspectives.mvdirona.com/2026/06/flat-datacenter-networks-at-scale/

by tanelpoder

6/9/2026 at 8:37:32 PM

Oh man, James Hamilton blog posts, I love these things! (Edit: for more concrete details, the Arxiv paper linked from the blog post is here https://arxiv.org/pdf/2604.15261 and the amazon.science link has some higher level view of the details https://www.amazon.science/blog/how-flat-is-replacing-fat-in... )

> The results were striking: compared to traditional fat-tree networks, RNG (Resilient Network Graphs) uses 69% fewer routers, delivers 33% higher throughput, cuts network power by 40%, and lowers operating costs by 27. In early 2026, RNG became the default design for most newly built Amazon data centers globally.

> For cabling, they developed the ShuffleBox—a passive optical device whose internal wiring combined with randomized ShuffleBox-to-ShuffleBox cabling yields “quasi-random” graphs that behave like truly random graphs.

This is pretty incredible, random layouts of networks that have on-average better properties...

I'm really curious about the long tail of performance though. What is the worst case scenario here? And are there some better case scenarios? Uniformity in Clos networks is pretty great, but many loads don't need uniformity, and if these RNG-based networks have non-uniformity, perhaps that has operational characteristics that can be helpful or harmful.

by epistasis

6/9/2026 at 9:51:50 PM

"Performance guarantees are stochastic rather than deterministic. The worst case performance (for metrics such as number of hops and oversubscription) is known, but for RNG our models are stochastic (i.e., the worst case performance is known with high probability). This is a weaker limitation than it might appear. Fat-tree guarantees are also effectively stochastic once you account for real-world failures, which are frequent at scale. RNG simply makes the stochastic nature explicit and designs for it from the start."

by UltraSane

6/9/2026 at 10:50:03 PM

Well I guess I'd like to see those guarantees, but more specifically, the variance of them.

I think Section 9, and Figures 13/14 in the Arxiv preprint sort of address this, but it doesn't mention anything about accounting for real-world failures in fat trees. I haven't had a chance to read it all, though...

by epistasis

6/9/2026 at 9:03:17 PM

Interesting reading this because this is essentially the principle behind https://socketcluster.io/ scalability; the sharding of channels across available brokers is pseudo-random. It uses a hash function for determinism but the distribution appears to be random and that was also the best way I could find to distribute load evenly between available nodes. It is key to its embarrassingly parallel design.

It's interesting to see it being done at the data centre level as well.

by socketcluster

6/10/2026 at 1:28:03 AM

That's a different thing entirely, that assumes you already have a physical layer that allows any client to connect to any broker, this is about building that physical layer

by trumpdong

6/10/2026 at 12:46:31 AM

How is this different from Jellyfish? I recall reading a paper around that back in the 2010s https://www.usenix.org/system/files/conference/nsdi12/nsdi12...

Edit: Answering my own question, Jellyfish proved theoretically that random networks can be better, and this is a working implementation based on that solves the problems with creating/operating random networks.

by dhruvrrp

6/9/2026 at 8:53:00 PM

It's not that dissimilar to how the Internet works. Although you have some steering like IX peering switches, and social/economic factors, but in whole it is fairly random.

by kev009

6/10/2026 at 1:29:05 AM

On the internet each ISP is looking for congested links and expanding those links, like a lazy-loaded fat tree.

by trumpdong

6/9/2026 at 11:32:56 PM

I get the feeling I am missing some info (Like what is meant here by Data Center Networks) thats preventing me from understanding whats happening here. I am guessing that this falls outside of the traditional rack/colo paradigm and has more to do with hyperscalers.

by protocolture

6/10/2026 at 12:19:58 AM

It's much larger but fundamentally it's not that different. In each rack you have one or two switches. How do you connect those racks to each other? The standard answer (simplified) is centralized spine switches but AWS discovered that a random network where the rack switches connect directly to each other is cheaper.

by wmf

6/9/2026 at 11:47:03 PM

AWS hasn't published all that much about their network, but we know they use a high-radix folded Clos fabric because they mention it in their paper about Scalable Reliable Datagram. If you want an overview of the folded Clos fabric, try reading Google's "Jupiter Rising" paper. https://dl.acm.org/doi/pdf/10.1145/2829988.2787508

by jeffbee

6/10/2026 at 12:34:46 AM

Then, if you want to know about using optical switches to connect Clos segments without a fixed spine, check out Google's "Mission Apollo" paper: https://arxiv.org/pdf/2208.10041

by jsolson

6/9/2026 at 10:08:14 PM

Good 6 minute video explainer: https://youtube.com/watch?v=yDoRYRRPOA0

by mino

6/9/2026 at 11:29:22 PM

I am moderately insulted that they want me to believe that graph theoreticians at Amazon sit at a desk with a bunch of optical T&M equipment. Sort of silly!

by jeffbee

6/9/2026 at 10:56:33 PM

The win in operating costs is impressive, bordering in the unbelievable (27x). Does anyone have a clue about where the win comes from?

by wofo

6/9/2026 at 11:26:50 PM

It's almost certainly meant to be 27%, not a factor of 27.

by mattclarkdotnet

6/9/2026 at 11:22:45 PM

I am fairly confident that is a typo. The paper says "Our analysis reveals that RNG topologies are 9–45% cheaper than fat trees with equivalent oversubscription ratio." The blogger probably left off the % sign.

by jeffbee

6/9/2026 at 10:11:15 PM

I always like these randomized/semi-randomized network papers. Here's a little known one you might enjoy if you liked this one. https://repositorio.unican.es/xmlui/handle/10902/23594

by fdr

6/9/2026 at 10:34:53 PM

One interesting consequence of this is that it's now (for the first time!) possible to get unlimited AWS egress.

It's not cheap, and it's limited to `us-east-1`, but it's at least _possible_ now via AWS Interconnect: https://aws.amazon.com/interconnect/lastmile/pricing/

by cyberax

6/10/2026 at 1:30:18 AM

It's not a consequence of this.

And that's their... third... fourth? completely separate system for connecting an AWS virtual network to a physical DC in the outside world?

by trumpdong

6/10/2026 at 1:57:11 AM

Yes, there are several other options. This is the first one that is not metered.

by cyberax

6/9/2026 at 11:20:58 PM

I don't think these things have anything to do with each other.

by wmf

6/10/2026 at 5:50:22 AM

I heard from several engineers several years ago that egress was metered because their network did not have enough switching capacity to make it truly unmetered at AWS scale. Looks like this might have been a part of the solution.

by cyberax

6/9/2026 at 3:43:44 AM

[flagged]

by tanelpoder