alt.hn

4/7/2026 at 7:14:32 PM

RAM Has a Design Flaw from 1966. I Bypassed It [video]

https://www.youtube.com/watch?v=KKbgulTp3FE

by surprisetalk

4/10/2026 at 2:56:26 AM

This is very much worth watching. It is a tour de force.

Laurie does an amazing job of reimagining Google's strange job optimisation technique (for jobs running on hard disk storage) that uses 2 CPUs to do the same job. The technique simply takes the result of the machine that finishes it first, discarding the slower job's results... It seems expensive in resources, but it works and allows high priority tasks to run optimally.

Laurie re-imagines this process but for RAM!! In doing this she needs to deal with Cores, RAM channels and other relatively undocumented CPU memory management features.

She was even able to work out various undocumented CPU/RAM settings by using her tool to find where timing differences exposed various CPU settings.

She's turned "Tailslayer" into a lib now, available on Github, https://github.com/LaurieWired/tailslayer

You can see her having so much fun, doing cool victory dances as she works out ways of getting around each of the issues that she finds.

The experimentation, explanation and graphing of results is fantastic. Amazing stuff. Perhaps someone will use this somewhere?

As mentioned in the YT comments, the work done here is probably a Master's degrees worth of work, experimentation and documentation.

Go Laurie!

by kreelman

4/10/2026 at 7:12:50 AM

This is a 54 minute video. I watched about 3 minutes and it seemed like some potentially interesting info wrapped in useless visuals. I thought about downloading and reading the transcript (that's faster than watching videos), but it seems to me that it's another video that would be much better as a blog post. Could someone summarize in a sentence or two? Yes we know about the refresh interval. What is the bypass?

Update: found the bypass via the youtube blurb: https://github.com/LaurieWired/tailslayer

"Tailslayer is a C++ library that reduces tail latency in RAM reads caused by DRAM refresh stalls.

"It replicates data across multiple, independent DRAM channels with uncorrelated refresh schedules, using (undocumented!) channel scrambling offsets that works on AMD, Intel, and Graviton. Once the request comes in, Tailslayer issues hedged reads across all replicas, allowing the work to be performed on whichever result responds first."

by throwaway81523

4/10/2026 at 9:05:28 AM

The video could be a shorter, some of the goofiness might not please the most pressed people but that is also what makes it fresh and stand out.

by kelsolaar

4/10/2026 at 8:20:58 AM

> using (undocumented!) channel scrambling offsets that works on AMD, Intel, and Graviton

Seems odd to me that all three architectures implement this yet all three leave it undocumented. Is it intended as some sort of debug functionality or what?

by fc417fc802

4/10/2026 at 9:01:45 AM

it's explained in the video, and there's no way I'll be explaining it better than her

by alex_duf

4/10/2026 at 10:13:33 AM

you could however link to the timestamp where that particular explanation starts. i am afraid i don't have time to watch a one hour video just to satisfy my curiosity.

by em-bee

4/10/2026 at 10:36:02 AM

This is approximately the section in the video titled "Memory controllers hate you" (https://www.youtube.com/watch?v=KKbgulTp3FE&t=1399s), combined with the following section.

The actual explanation starts a couple minutes later, around https://youtu.be/KKbgulTp3FE?t=1553. The short explanation is performance (essentially load balancing against multiple RAM banks for large sequential RAM accesses), combined with a security-via-obscurity layer of defense against rowhammer.

by vitus

4/10/2026 at 7:18:58 AM

Just use the Ask button on YouTube videos to summarize, that's what it's for.

by satvikpendem

4/10/2026 at 10:19:43 AM

>Just use the Ask button on YouTube videos to summarize,

For anyone confused because they don't see the "Ask" button between the Share and Bookmark buttons...

It looks like you have to be signed-in to Youtube to see it. I always browse Youtube in incognito mode so I never saw the Ask button.

Another source of confusion is that some channels may not have it or some other unexplained reason: https://old.reddit.com/r/youtube/comments/1qaudqd/youtube_as...

by jasode

4/10/2026 at 9:49:46 AM

Not complaining about the particular presenter here, this is an interesting video with some decent content, I don't find the presentation style overly irritating, and it is documenting a lot of work that has obviously been done experimenting in order to get the end result (rather than just summarising someone else's work). Such a goofy elongated style, that is infuriating if you are looking for quick hard information, is practically required in order to drive wider interest in the channel.

But the “ask the LLM” thing is a sign of how off kilter information passing has become in the current world. A lot of stuff is packaged deliberately inefficiently because that is the way to monetise it, or sometimes just to game the searching & recommendation systems so it gets out to potentially interested people at all, then we are encouraged to use a computationally expensive process to summarise that to distil the information back out.

MS's documentation the large chunks of Azure is that way, but with even less excuse (they aren't a content creator needing to drive interest by being a quirky presenter as well as a potential information source). Instead of telling me to ask copilot to guess what I need to know, why not write some good documentation that you can reference directly (or that I can search through)? Heck, use copilot to draft that documentation if you want to (but please have humans review the result for hallucinations, missed parts, and other inaccuracies, before publishing).

by dspillett

4/10/2026 at 8:24:48 AM

Unnecessarily negative imo.

I like the video because I cant read a blog post in the background while doing other stuff, and I like Gadget Hackwrench narrating semi-obscure CS topics lol

by svrtknst

4/10/2026 at 8:32:18 AM

> I cant read a blog post in the background

You can consume technical content in the background?

by fc417fc802

4/10/2026 at 9:40:00 AM

this is a thing people do. convince themselves they can consume technical content subconsciously. its now how the brain works though. it will just give you the idea you are following something.

by saidnooneever

4/10/2026 at 10:23:03 AM

not all technical content is the same, or has the same level of importance. this video does not introduce anything that i need to be able to replicate in my work, so i don't need to catch every detail of it, just grasp the basic concepts and reasons for doing something.

by em-bee

4/10/2026 at 10:15:15 AM

if your foreground work doesn't occupy your brain, why not?

by em-bee

4/10/2026 at 10:51:21 AM

Because I prefer not to think about the hair I'm removing from my shower drain?

by vintermann

4/10/2026 at 10:53:27 AM

FWIW, I like her videos but I usually prefer essays or blog posts in general as they're easier to scan and process at my own rate. It's not about this particular video, it's about videos in general.

by derbOac

4/10/2026 at 9:22:21 AM

I hope this approach gets some visibility in the CPU field. It could be obviously improved with a special cpu instruction which simply races two reads and returns the first one which succeeds. She’s doing an insane amount of work, making multiple threads and so on (and burning lots of performance) all to work around the lack of dedicated support for this in silicon.

by josephg

4/10/2026 at 4:37:11 AM

>> It replicates data across multiple, independent DRAM channels with uncorrelated refresh schedules

This is the sort of thing which was done before in a world where there was NUMA, but that is easy. Just task-set and mbind your way around it to keep your copies in both places.

The crazy part of what she's done is how to determine that the two copies don't get get hit by refresh cycles at the same time.

Particularly by experimenting on something proprietary like Graviton.

by gopalv

4/10/2026 at 10:12:49 AM

"This is the sort of thing which was done before in a world where there was NUMA"

You sound like NUMA was dead, is this a bit of hyperbole or would really say there is no NUMA anymore. Honest question because I am out if touch.

by weinzierl

4/10/2026 at 4:59:27 AM

She determines that by having three copies. Or four. Or eight.

Tis just probabilities and unlikelihood of hitting a refresh cycle across that many memory channels all at once.

by rockskon

4/10/2026 at 7:16:12 AM

Right, but the impressive part is finding addresses that are actually on different memory channels.

by GeneralMayhem

4/10/2026 at 10:00:03 AM

Surprising to me that two memory channels are separated by as little as 256 bytes. The short distance makes it easier to find, surely?

by kzrdude

4/10/2026 at 7:21:24 AM

> Google's strange job optimisation technique (for jobs running on hard disk storage)

Can you give more context on this? Opus couldn't figure out a reference for it

by 100ms

4/10/2026 at 7:40:42 AM

This is a quite old technique. The idea, as I understood it, was that lots of data at Google was stored in triplicate for reliability purposes. Instead of fetching one, you fetched all three and then took the one that arrived first. Then you sent UDP packets cancelling the other two. For something like search where you're issuing hundreds of requests that have to resolve in a few hundred milliseconds, this substantially cut down on tail latency.

by why_only_15

4/10/2026 at 7:47:26 AM

Tournament parallelism is the technical term IIRC.

by yvdriess

4/10/2026 at 8:36:11 AM

Aha that makes more sense, I thought it was specifically to do with job scheduling from the description. You can do something similar at home as a poor man's CDN by racing requests to regionally replicated S3 buckets. Also magic eyeballs (ipv4/v6 race done in browsers and I think also for Quic/HTTP selection) works pretty much the same way

by 100ms

4/10/2026 at 10:44:56 AM

> magic eyeballs

https://en.wikipedia.org/wiki/Happy_Eyeballs is the usual name. It's not quite identical, since you often want to give your preferred transport a nominal headstart so it usually succeeds. But yes, there are some similarities -- you race during connection setup so that you don't have to wait for a connection timeout (on the order of seconds) if the preferred mechanism doesn't work for some reason.

The main term I've seen for this particular approach is "request hedging" (https://grpc.io/docs/guides/request-hedging/, which links to the paper by Dean and Barroso).

by vitus

4/10/2026 at 11:06:54 AM

Happy eyeballs, that makes a lot more sense thanks. Someone's "magic eyeballs" here apparently isn't reading his own writing :)

by 100ms

4/10/2026 at 3:43:23 AM

I like the video, but this is hardly groundbreaking. You send out two or more messengers hoping at least one of them will get there on time.

by ufocia

4/10/2026 at 4:58:34 AM

Yeah. These are literally just mainframe techniques from yesteryear.

by rcbdev

4/10/2026 at 8:09:26 AM

Almost everything "new" was invented by IBM it seems like. And it goes by a completely different name there. It's still nice to rediscover what they knew.

by actionfromafar

4/10/2026 at 4:24:33 AM

and dropbox was just rsync

by npunt

4/10/2026 at 4:32:11 AM

The clever part is figuring out what RAM is controlled by which controllers.

by UltraSane

4/10/2026 at 8:23:15 AM

everyone says this but no one says why it was clever. i find her videos have cool results but i cant have patience for them usually because its recycled old stuff (can be cool but its not ground breaking).

there is a ton of info you can pull from: smbios, acpi, msrs, cpuid etc. etc. about cpu/ram topology and connecticity, latencies etc etc.

isnt the info on what controllers/ram relationships exists somewhere in there provided by firmware or platform?

i can hardly imagine it is not just plainly in there with the plethtora info in there...

theres srat/slit/hmat etc. in acpi, then theres MSRs with info (amd expose more than intel ofc, as always) and then there is registers on memory controller itself as well as socket to socket interconnects from upi links..

its just a lot of reading and finding bits here n there. LLms are actually really good at pulling all sorts of stuff from various 6-10k page documents if u are too lazy to dig yourself -_-

by saidnooneever

4/10/2026 at 11:00:09 AM

The exact mapping between RAM addresses and memory controllers is intentionally abstracted by the memory subsystem with many abstraction layers between you and the physical RAM locations. Because documentation is sometimes incomplete or proprietary, security researchers often have to write software that probes memory and times the access speeds to reverse-engineer the exact interleaving functions of a specific CPU. in the video she says that ARM CPUs have the least data about this and she had to rely entirely on statistical methods.

by UltraSane

4/10/2026 at 10:01:14 AM

I have to say that using drawbridges and differently colored rail pieces to explain it was very clever.

by kzrdude

4/10/2026 at 3:28:16 AM

Love the format, and super cool to see a benchmark that so clearly shows DRAM refresh stalls, especially avoiding them via reverse engineering the channel layout! Ran it on my 9950X3D machine with dual-channel DDR5 and saw clear spikes from 70ns to 330ns every 15us or so.

The hedging technique is a cool demo too, but I’m not sure it’s practical.

At a high level it’s a bit contradictory; trying to reduce the tail latency of cold reads by doubling the cache footprint makes every other read even colder.

I understand the premise is “data larger than cache” given the clflush, but even then you’re spending 2x the memory bandwidth and cache pressure to shave ~250ns off spikes that only happen once every 15us. There’s just not a realistic scenario where that helps.

Especially HFT is significantly more complex than a huge lookup table in DRAM. In the time you spend doing a handful of 70ns DRAM reads, your competitor has done hundreds of reads from cache and a bunch of math. It’s just far better to work with what you can fit in cache. And to shrink what doesn’t as much as possible.

by foltik

4/10/2026 at 11:16:54 AM

> clear spikes from 70ns to 330ns

Isn't that rather trivial though as a source of tail latency? There's much worse spikes coming from other sources, e.g. power management states within the CPU and possibly other hardware. At the end of the day, this is why simple microcontrollers are still preferred for hard RT workloads. This work doesn't change that in any way.

by zozbot234

4/10/2026 at 7:23:47 AM

Another point about HFT - They're mostly using FPGAs (some use custom silicon) which means that they have much tighter control over how DRAM is accessed and how the memory controller is configured. They could implement this in hardware if they really need to, but it wouldn't be at the OS level.

by Lramseyer

4/10/2026 at 9:25:29 AM

It could be massively improved with a special CPU instruction for racing dram reads. That might make it actually useful for real applications. As it is, the threading model she used here would make it incredibly difficult to use this in a real program.

by josephg

4/10/2026 at 7:09:56 AM

On most RAM tREF can be increased a lot from the default, at least if kept somewhat cool.

by formerly_proven

4/10/2026 at 9:09:19 AM

A more accurate but less inspiring title would be:

RAM Has a Design Tradeoff from 1966. I made another one on top.

The first tradeoff, of 6x fewer transistors for some extra latency, is immensely beneficial. The second, of reducing some of that extra latency for extra copies of static data, is beneficial only to some extremely niche application. Still a very educational video about modern memory architecture.

[EDIT: accidental extra copy of this comment deleted]

by tromp

4/10/2026 at 10:04:23 AM

It could be a display bug on my side, but you posted this exact comment twice.

by kitku

4/10/2026 at 10:29:12 AM

He tried to reduce latency

by cryptonym

4/10/2026 at 11:09:21 AM

Probably will get a lot of views from guys who have no idea what she is talking about.

by t1234s

4/10/2026 at 11:14:15 AM

Being a woman in tech seems to have some benefits at least on YouTube

by jqbd

4/10/2026 at 4:32:16 AM

Halfway through this great video and I have two questions:

1) Can we take this library and turn it into a a generic driver or something that applies the technique to all software (kernel and userspace) running on the system? i.e. If I want to halve my effective memory in order to completely eliminate the tail latency problem, without having to rewrite legacy software to implement this invention.

2) What model miniature smoke machine is that? I instruct volunteer firefighters and occasionally do scale model demos to teach ventilation concepts. Some research years back led me to the "Tiny FX" fogger which works great, but it's expensive and this thing looks even more convenient.

by rkagerer

4/10/2026 at 6:13:13 AM

1. not that I can think of, due to the core split. It really has to be independent cores racing independent loads. anything clever you could do with kernel modules, page-table-land, or dynamically reacting via PMU counters would likely cost microseconds...far larger than the 10s-100s of nanoseconds you gain.

what I wished I had during this project is a hypothetical hedged_load ISA instruction. Issue two requests to two memory controllers and drop the loser. That would let the strategy work on a single thread! Or, even better, integrating the behavior into the memory controller itself, which would be transparent to all software without recompilation. But, you’d have to convince Intel/AMD/someone else :)

2. It’s called a “smokeninja”. Fairly popular in product photography circles, it’s quite fun!

by lauriewired

4/10/2026 at 6:52:26 AM

Or, even better, integrating the behavior into the memory controller itself, which would be transparent to all software without recompilation.

Yeah it would be neat to just flip a BIOS switch and put your memory into "hedge" mode. Maybe one day we'll have an open source hardware stack where tinkerers can directly fiddle with ideas like this. In the meantime, thanks for your extensive work proving out the concept and sharing it with the world!

by rkagerer

4/10/2026 at 7:04:43 AM

Is there a reason you can think of why AMD, Intel etc. would not want to do this?

Really enjoyed the video and feel that I (not being in the IT industry) better understand CPUs und and RAM now.

by solstice

4/10/2026 at 6:04:44 AM

> halve my effective memory in order to completely eliminate the tail latency problem,

Wouldn't you have a tail latency problem on the write side though if you just blindly apply it every where? As in unless all the replicas are done writing you can't proceed.

by hawk_

4/10/2026 at 5:25:29 AM

Brio 33884. It has a tiny ultrasonic humidifier in there.

by imp0cat

4/10/2026 at 8:00:51 AM

This is a cool idea, very well put through for everyone to understand such an esoteric concept.

However I wonder if the core idea itself is useful or not in practice. With modern memory there are two main aspects it makes worse. First is cost, it needs to double the memory used for the same compute. With memory costs already soaring this is not good. Then the other main issue of throughout, haven’t put enough thought into that yet but feels like it requires more orchestration and increases costs there too.

by yalogin

4/10/2026 at 8:19:14 AM

Doesn't doing this halve the computing power? I don't know this world at all, is that acceptable?

by sbiru93

4/10/2026 at 8:45:41 AM

It halves (or thirds or quarters or etc) available CPU cores, cache space, memory bandwidth, all the critical resources. So I expect that it's only applicable for small reads that you are reasonably certain won't be in cache and that it can only be used extremely sparingly, otherwise it will be nothing but a massive drain.

by fc417fc802

4/10/2026 at 3:47:23 AM

Should say DRAM, SRAM does not have this.

by boznz

4/10/2026 at 8:10:25 AM

Indeed. And only for certain DRAM refresh strategies. I mean, it's at least conceivable that a memory management system responsible for the refresh notices that a given memory location is requested by the cache and then fills the cache during the refresh (which afaiu reads the memory) or -- simpler to implement perhaps -- delays the refresh by a μs allowing the cache-fill to race ahead.

(seems that in the earlier submission, https://news.ycombinator.com/item?id=47680023, jeffbee hinted that IBM zEnterprise is doing something to that effect)

Said that, I'm not convinced that this is a big issue in practice. If you really care about performance, you got to avoid cache misses.

by guenthert

4/10/2026 at 11:01:54 AM

None of the DDR2 and onwards memories have anywhere near enough bandwidth to meet refresh frequency on each bit by you even just reading it in a loop.

The refresh that we do is run in parallel on the memory arrays inside the RAM chips completely bypassing any of the related IO machinery.

by namibj

4/10/2026 at 6:50:08 AM

I haven't had time to see the whole thing yet, but I'm quite surprised this yielded good results. If this works I would have expected CPU implementations to do some optimization around this by default given the memory latency bottleneck of the last 1.5 decades. What am I missing here?

by josalhor

4/10/2026 at 7:11:33 AM

Turning on mirroring does this for the low, low price of doubling your RAM cost.

by formerly_proven

4/10/2026 at 7:02:35 AM

She could probably have been stinking rich on this work alone, but instead she just put it up on Github. Kudos to Laurie.

by bronlund

4/10/2026 at 7:30:50 AM

She probably is already stinking rich, or at least rich enough. Beyond certain point, though, research and knowledge seems more interesting than riches, and particularly if you feel yourself a researcher. Otherwise, perhaps, she be doing the same to business and be Ellona or something. Thank God she does not, but the contrary - is an inspiration to so many people - young and adult. Kudos!

by larodi

4/10/2026 at 8:33:32 AM

Companies are standing in line to double their RAM usage right now, right.

by ahoka

4/10/2026 at 10:27:31 AM

For an HFT firm, RAM cost is a non-issue. Even the tiniest improvement in latency can result in millions of dollars of extra profit. They can octuple their RAM usage and still make a killing.

I bet Citadel already has reached out to Laurie :)

by bronlund

4/10/2026 at 11:10:58 AM

I bet they already have much better solutions if they actually have these problems and don't need to watch an iCarly fan fiction on youtube to learn about DRAM timings.

by ahoka

4/10/2026 at 11:14:22 AM

This is not like a problem which needed to be fixed, it's an improvement in efficiency - though a costly one. It may very well be that they weren't even aware that this was possible.

We are talking about a company which make their own custom microchips, so maybe you're right.

by bronlund

4/10/2026 at 10:00:02 AM

Depends how much total RAM your application needs and how much money RAM access tail latency costs your business.

by gkbrk

4/10/2026 at 10:03:00 AM

Just annoyed by this slop this twitter shilposter, just read her tweets to see how much of garbage she spews every waking minute

by villgax

4/10/2026 at 4:59:15 AM

Am I the only one who feels the comments here don't sound organic at all?

by rcbdev

4/10/2026 at 5:51:59 AM

No I felt the same way, they're exactly like the usual LLM bot comment where a LLM recap ops and ends with an platitude or witty encouragement.

But all the accounts are old/legit so I think that you and me have just become paranoid...

by tredre3

4/10/2026 at 9:23:21 AM

I have become oversensitive to this, and my brain is probably generating a lot of false positives. I don't think it's necessarily the case here, but I've wondered if people who use LLMs a lot take over some of its idiosyncrasies and in a way start sounding like one a bit. A strange side effect is that I've come to appreciate text with grammatical errors, videos where people don't enunciate well etc because it's a sign that it's human created content.

by wkjagt

4/10/2026 at 9:22:25 AM

I think it's more people being fascinated by this curious architectural detail. I imagine it's fascinating to people who are not exposed to the intricate details of computer architecture, which I assume is the vast majority here. It's a glimpse into a very odd world (which is your day-to-day work in the HFT field, but they rarely talk about this, and much less in such big words).

TBH, I didn't watch the video because the title is too click-baity for me and it's too long. Instead, I looked at the benchmark results on the Github page and sure, it's fascinating how you can significantly(!) thin the latency distribution, just by using 10× more CPU cores/RAM/etc. Classic case of a bad trade-off.

And nobody talked about what we use RAM for, usually: Not to only store static data, but also to update it when the need arises. This scheme is completely impractical for those cases. Additionally, if you really need low latency, as others pointed out, you can go for other means of computation, such as FPGAs.

So I love this idea, I'm sure it's a fun topic to talk about at a hacker conference! But I'm really put off by the click-baity title of the video and the hype around it.

by v1ne

4/10/2026 at 5:46:46 AM

You're absolutely right

by isoprophlex

4/10/2026 at 6:09:29 AM

You're absolutely right to call this out. No humans, no emotion, no real comments - just LLM slop.

In all seriousness, agreed. The top comment at time of this writing seems like a poor summarizing LLM treating everything as the best thing since sliced bread. The end result is interesting, but neither this nor Google invented the technique of trying multiple things at once as the comment implies.

by silisili

4/10/2026 at 8:22:42 AM

No, something is funny here. In the previous submission (https://news.ycombinator.com/item?id=47680023) the only (competently) criticizing comment (by jeffbee) was downvoted into oblivion/flagged.

by guenthert

4/10/2026 at 8:49:23 AM

Well he veered off of the technical and into the personal so I'm not surprised it's dead. But yeah something feels weird about this comment section as a whole but I can't quite put my finger on it.

I think rather than AI it reminds me of when (long before AI) a few colleagues would converge on an article to post supportive comments in what felt like an attempt to manipulate the narrative and even at concentrations that I find surprisingly low it would often skew my impression of the tone of the entire comment section in a strange way. I guess you could more generally describe the phenomenon as fan club comments.

by fc417fc802

4/10/2026 at 10:10:15 AM

It is one of the few instances were the reddit discussion seems more normal/indepth. See the longer comments here:

https://www.reddit.com/r/programming/comments/1sgtkdf/tailsl...

There are a few glazing comments there too though.

> Well he veered off of the technical and into the personal so I'm not surprised it's dead.

I don't know what he posted, but it is easy to see how a small fan group around Laurie can form?

She is an attractive girl not afraid to be cute (which is done so seldom by women in tech that I found a reddit thread trying to triangulate if she is trans. I am not posting that to raise the question, but she piques peoples interest) plus the impressively high effort put into niche topics PLUS the impressively high production value to present all that.

by ralfd

4/10/2026 at 5:34:42 AM

I don’t see anything unusual

by Alifatisk

4/10/2026 at 2:58:47 AM

[dead]

by rationalist

4/7/2026 at 11:55:51 PM

[flagged]

by dragonsenseiguy

4/10/2026 at 8:44:21 AM

[flagged]

by dombiscoff

4/10/2026 at 5:04:25 AM

This is an unreasonably good video. Hopefully, it inspires others to see we can still think hard and critically about technical things.

by dinkumthinkum

4/10/2026 at 6:56:47 AM

Yeah, wow, the comments weren't kidding. This'll probably be the best video I watch all month, at least, if not more. I would have said what she was trying to do was "impossible" (had I not seen the title and figured … well … she posted the video) and right about when I was thinking that she got me with:

> Hold on a second. That's a really bad excuse. And technology never got anywhere by saying I accept this and it is what it is.

by deathanatos