2/22/2026 at 2:35:22 AM
Yeah, GPUdirect should allow you to dma straight to a storage device.I wonder... what if the m.2 storage was actually DRAM? You probably don't need persistence for spilling a model off the GPU. How would it fare vs just adding more host memory? The m.2 ram would be less flexible, but would keep the system ram free for the CPU.
by 01100011
2/22/2026 at 3:14:05 AM
Yeah a ramdisk would probably work wonders. It's a shame Intel optane didn't became a standard, those type of workflows would be amazing for it.by javchz
2/22/2026 at 3:07:31 PM
Ya know, here on the local market there are a bunch of optanes hanging around, I'll try to manage one to check if there's any improvementby xaskasdf
2/22/2026 at 6:08:47 PM
Optanes will be good for latency, but not so much for BW which seems to be your major bottleneck if I'm not mistaken?by jonassm
2/22/2026 at 8:06:24 PM
yeah, the mobo upgrade is something I gotta do anyway, so I'll cover that up more or less, the optane is something I didn't thought aboutby xaskasdf
2/22/2026 at 3:54:30 AM
Ahhh damn it. Intel! Come back!by TechSquidTV
2/22/2026 at 7:39:49 AM
This is exactly what I was wonderingI gave a talk a few years ago at dask summit (conf?) on making the stars align with dask-cudf here. We were helping a customer accelerate log analytics by proving out our stack for nodes that look roughly like: parallel ssd storage arrays (30 x 3 GB/s?) -> GPUDirect Storage -> 4 x 30 GB/s PCIe (?) -> 8 x A100 GPUs, something like that. It'd be cool to see the same thing now in the LLM world, such as a multi-GPU MoE, or even a single-GPU one for that matter!
by lmeyerov
2/22/2026 at 5:12:12 AM
Isn't m.2 storage but DRAM - hopefully, meaning NVMe/PCIe not SATA speed - already exists as Compute Express Link (CXL), just not in this specific m.2 form factor? If only RAM wasn't silly expensive right now, one could use 31GB/s of additional bandwidth per NVMe connector.by ElectricalUnion
2/22/2026 at 5:33:39 PM
The marvel cxl 2.0 ddr4 card Serve the Home used for kvcache speed ups. And I am personally looking forward to cxl 3 and memory coherence across my system builds.https://www.servethehome.com/hyper-scalers-are-using-cxl-to-...
by bhewes