alt.hn

3/29/2025 at 5:51:12 PM

Vramfs: Vram Based Filesystem for Linux

https://github.com/Overv/vramfs

by signa11

3/30/2025 at 1:31:43 AM

That's cool but I think the proper solution is to write a Linux kernel module that can reserve GPU RAM via DRM to create ramdisks, not create a userspace filesystem using OpenCL.

That would give proper caching, direct mmap support if desired, a reliable, correct and concurrent filesystem (as opposed to this author's "all of the FUSE callbacks share a mutex to ensure that only one thread is mutating the file system at a time"), etc.

by devit

3/29/2025 at 7:05:06 PM

On the topic of coercing bits into functioning as data storage: harder drive ( http://tom7.org/harder/ )

by dcanelhas

3/29/2025 at 8:04:12 PM

  > harder drive
Here's the direct YouTube link[0]

I'd *HIGHLY* recommend this video to anyone here. It is exactly that fun silly computer science stuff where you also learn a shit ton. His channel is full of this stuff.

  Don't ask why, ask why not
Is essentially the motto of his channel, and it is the best. Leads to lots of innovations and I think we all should encourage more of this kind of stuff.

  [0] https://www.youtube.com/watch?v=JcJSW7Rprio

by godelski

3/29/2025 at 8:12:21 PM

2 GB/s is pretty crappy, that’s about the burst speed of many nvme SSDs.

Virtual disk should me more then 6 gb/s at least with ddr5.

by shadowpho

3/29/2025 at 8:35:27 PM

Yes but bear in mind that those benchmarks were taken on an ancient system, with an ancient OS/kernel and FUSE:

   - OS: Ubuntu 14.04.01 LTS (64 bit)   
   - CPU: Intel Core i5-2500K @ 4.0 Ghz   
   - RAM: 8GB DDR3-1600   
   - GPU: AMD R9 290 4GB (Sapphire Tri-X)
So that's an Gen 2 CPU, with DDR3 RAM and a PCIe 3.0 GPU.

On a modern system, with a recent kernel+FUSE, I expect the results would be much better.

But we also now have the phram kernel module, with which you can create a block device completely bypassing FUSE, so using phram should result in even greater performance than vramfs.

by d3Xt3r

3/29/2025 at 8:44:33 PM

Also all reads and writes have to go across pcie and through the cpu, which should be fast but you are not going to get vram to gpu access speeds

by somat

3/29/2025 at 9:31:49 PM

using precious vram to store files is a special kind of humor. especially since someone actually implemented it. kudos

by finnjohnsen2

3/31/2025 at 8:11:48 AM

It is not precious if you don't run LLMs or play games. For many people like myself, video card is idle most of the time. Using its ram to speed-up compilation or similar is not a bad idea.

by theragra

3/31/2025 at 5:04:57 PM

It would be interesting to have something that used VRAM when there was no other demand and regular RAM otherwise.

Even gamers and (most) LLM users are not using the GPU all the time.

by LeFantome

3/30/2025 at 2:42:05 PM

Somewhat related, there is NVIDIA CUDA Direct Storage[0] which provides an API for efficient “file transfer” between GPU and local filesystem. Always wanted to give it a try but haven’t yet

[0]: https://docs.nvidia.com/gpudirect-storage/index.html

by fp64

3/29/2025 at 6:38:05 PM

What is the overhead on a FUSE filesystem compared to being implemented in the kernel? Could something like eBPF be used to make a faster FUSE-like filesystem driver?

by 12destroyer21

3/29/2025 at 8:53:57 PM

> What is the overhead on a FUSE filesystem compared to being implemented in the kernel?

The overhead is quite high, because of the additional context switching and copying of data between user and kernel space.

> Could something like eBPF be used to make a faster FUSE-like filesystem driver?

eBPF can't really change any of the problems I noted above. To improve performance one would need to change how the interface between kernel and user space part of FUSE filesystem works to make it more efficient.

That said FUSE support for io_uring, which got merged recently in Linux 6.14, has a potential there, see:

https://www.phoronix.com/news/Linux-6.14-FUSE

by marbu

3/29/2025 at 8:32:37 PM

There is considerable overhead of the user space <> kernel <> userspace switches, you can see similar with something like Wireguard if you compare the performance of its go client Vs the kernel driver.

Some fuse drivers can avoid the overhead by letting the kernel know that the backing resource of a fuse filesystem can be handled by the kernel (e.g. for fuse based overlays FS where the backing storage is xfs or something), that probably isn't applicable here.

If you're in kernel space though I don't think you'd have access to OpenCL so easily, you'd need to reimplement it based on kernel primitives.

by ChocolateGod

3/30/2025 at 9:07:39 AM

> What is the overhead on a FUSE filesystem compared to being implemented in the kernel?

It depends on your use case.

If you serve most of your requests from kernel caches, then fuse doesn't add any overhead. That was the case for me, when I had a FUSE service running to directly serve all commits from all branches (from all of history) at the same time as directories directly from the data in a .git folder.

by eru

3/29/2025 at 7:44:21 PM

If you want a vramfs, why would you use GPU VRAM? CPU<->GPU copy speeds are not great.

I have 192GB of CPU VRAM in my desktop and that was cheap to obtain. Absolute best build decision ever.

by dheera

3/29/2025 at 9:36:40 PM

> I have 192GB of CPU VRAM in my desktop and that was cheap to obtain.

How? Or what's "cheap" here? (Because I wouldn't call 192G of just regular RAM that's plugged into the motherboard cheap, I think everything else is more expensive, and if there's some hack here that I haven't caught I very much would like to know about it)

by yjftsjthsd-h

3/30/2025 at 2:16:16 AM

4x48GB Corsair DDR5 sticks is about $500.*

Which is pretty cheap compared to the cost of my whole build and whatever other things I've spent on. Cheap is relative, but I'm just saying that if you're going to spend $3000+ on a build, and you love to work with massive datasets, VMs, and things, $500 for a metric fuckton of RAM so that your system is never, ever swapping, is a very worthwhile thing to spend on.

192GB worth of GPU will cost you about $40000, for reference, and will be less performant if your goal is just a vramfs for CPU tasks.

* Beware that using 4 DDR5 slots will cut your memory bandwidth in half on consumer motherboards and CPUs. But I willingly made that tradeoff. Maybe at some point I'll upgrade to a server motherboard and CPU.

by dheera

3/30/2025 at 2:23:38 AM

Ah, okay. Yes, if that's your reference point then just buying more RAM to plug into the motherboard is an excellent deal.

by yjftsjthsd-h

3/30/2025 at 5:48:49 AM

Regarding *, do you know why? Shouldn't dual/quad channel be in effect?

by brutal_chaos_

3/29/2025 at 7:52:20 PM

What other VRAM is there?

by LtdJorge

3/29/2025 at 11:40:01 PM

Couple of reasons. 1. You can use vram when you don't have massive amounts of ram for a ramdisk (or /dev/shm) 2. Depending on implementation, you might have faster random seek/write than normal ram. 3. You could presumably run certain gpu kernels on the vramfs.

by winwang

3/30/2025 at 10:45:17 AM

Cool, I love that there are ways to utilize RAM and VRAM as filesystems. Sometimes you just don't need all that pure RAM/VRAM.

by harha_

3/29/2025 at 6:46:05 PM

These days is it better to use an old video card or a few PCIE NVME multiplexer for those same lanes?

by hinkley

3/29/2025 at 10:15:21 PM

Hands down the latter. Good M.2 drives can generally get pretty close to the capacity of the bus, and you can fit literally a thousand times more stuff on 4 NVME than you can on any old GPU.

by Tuna-Fish

3/30/2025 at 8:39:41 AM

The nvme, by far.

Hard to imagine a reasonable real-world use-case for vramfs. Still cool.

by 3np

3/30/2025 at 5:40:21 PM

It has been tried in each generation of motherboard design but in an era where GPUs had a custom motherboard slot that normal cards could not occupy it made a sort of sense. And I know there have been times where the northbridge could not saturate as many PCIe devices as one might have motherboard slots. So even leaving the slot intended for GPUs empty or populated with a daughter card might be leaving performance on the table. But I suspect a riser card would fit handily into a 16x slot without blocking more than one or two 2x slots.

by hinkley

3/29/2025 at 8:13:38 PM

Using something like this would keep the GPU powered on and unable to shut itself off.

by Dwedit

3/29/2025 at 8:31:01 PM

Why? Vram has to be powered as long as you're scanning out of it, any competent design is going to support powering down most of the GPU while keeping RAM alive otherwise an idle desktop is going to suck way more power than necessary

by mjg59

3/30/2025 at 4:22:31 AM

Some GPUs don't have scanout, such as Laptop GPUs that pipe pixels through the iGPU. Those fully power off when they're not in use.

by Dwedit

3/29/2025 at 9:01:55 PM

I wonder if any GPU is powering down chips or banks like you can on PC.

They all have MMUs right? So you could defrag all in-use memory to fewer refresh domains too.

by bobmcnamara

3/29/2025 at 9:31:41 PM

GPUs will drop memory clocks dynamically, with at least one supported clock speed that's intended to be just fast enough to support scanning out the framebuffer. I haven't seen any indication that anybody is dynamically offlining VRAM capacity.

by wtallis

3/29/2025 at 9:28:55 PM

you can validate this yourself: if you have access to an A/H100, allocate a 30gb tensor and do nothing - you'll see nvidia-smi's reported wattage go up by a watt or so

by JackYoustra

3/29/2025 at 11:30:26 PM

That doesn't prove anything. Allocating a second 30GB chunk and seeing the power go up another Watt would be more convincing.

by wtallis

3/29/2025 at 9:47:20 PM

Doesn't the graphics processor of the pi double as bootstrap loader?

by ggm

3/29/2025 at 6:24:03 PM

could be a good place to sequester a swap file, similar to zram

by knome

3/29/2025 at 6:38:34 PM

You can, but https://wiki.archlinux.org/title/Swap_on_video_RAM suggests not doing it this way:

> Warning: Multiple users have reported this to cause system freezes, even with the fix in #Complete system freeze under high memory pressure. Other GPU management processes or libraries may be swapped out, leading to nonrecoverable page faults.

and in general you have to be really careful swapping to anything that uses a driver that could itself be swapped (which FUSE is especially prone to, but IIRC even ZFS and NFS did(?) have caveats with swap).

OTOH that same page documents a way to swap to vram without going through userspace, so don't take this as opposition to the general idea:)

by yjftsjthsd-h

3/30/2025 at 8:41:34 AM

> IIRC even ZFS and NFS did(?) have caveats with swap

ZFS still does. If you run VMs off zvols you want to avoid putting swap there. Learned this the hard way.

by 3np