4/19/2026 at 3:41:17 AM
I'm somewhat confused as to why this is on the front page. It doesn't go into any real detail, and the advice it gives is... not good. You should definitely not be quantizing your own gguf's using an old method like that hf script. There are lots of ways to run LLMs via podman (some even officially recommended by the project!). The chip has been out for almost a year now, and its most notable (and relevant-to-AI) feature is not mentioned in this article (it's the only x86_64 chip below workstation/server grade that has quad-channel RAM-- and inference is generally RAM constrained). I'm also quite puzzled about this bit about running pytorch via uv.Anyway. I wouldn't recommend following the steps posted in there. Poke around google, or ask your friendly neighborhood LLM for some advice on how to set up your Strix Halo laptop/desktop for the tasks described. A good resource to start with would probably be the unsloth page for whichever model you are trying to run. (There are a few quantization groups that are competing for top-place with gguf's, and unsloth is regularly at the top-- with incredible documentation on inference, training, etc.)
Anyway, sorry to be harsh. I understand that this is just a blog for jotting down stuff you're doing, which is a great thing to do. I'm mostly just commenting on the fact that this is on the front page of hn for some reason.
by spoaceman7777
4/19/2026 at 4:50:39 AM
Thanks for writing this comment, I think seeing someone’s “first impressions” and then seeing someone else’s response to those thoughts is more interesting and feels more connected socially than just reading a “correct” guide or similar especially when it’s something I’m curious about but wouldn’t necessarily be motivated enough to actually try out myself.by pierrekin
4/19/2026 at 1:54:36 PM
Agreed. Been running a Strix Halo box since mid-2025. Lemonade builds of llama.cpp with Unsloth or Bartowski quants have proven to be excellent.by rpdillon
4/19/2026 at 5:50:30 AM
Quad-channel RAM is common on consumer desktops. Strix Halo has *8* channels, and also very fast RAM (soldered RAM can be faster than dimms because the traces are shorter.)by fwipsy
4/19/2026 at 6:16:12 AM
Quad channel memory is not common on consumer desktops, it's a strictly HEDT and above feature. The vast majority of consumer desktops have 2 channels or fewer.by fluoridation
4/19/2026 at 9:52:42 AM
One should no longer use the word "channel" because the width of a channel differs between various kinds of memories, even among those that can be used with the same CPU (e.g. between DDR and LPDDR or between DDR4 and DDR5).For instance, now the majority of desktops with DDR5 have 4 channels, not 2 channels, but the channels are narrower, so the width of the memory interface is the same as before.
To avoid ambiguities, one should always write the width of the memory interface.
Most desktop computers and laptop computers have 128-bit memory interfaces.
The cheapest desktop computers and laptop computers, e.g. those with Intel Alder Lake N/Twin Lake CPUs, and also many smartphones and Arm-based SBCs, have 64-bit memory interfaces.
Cheaper smartphones and Arm-based SBCs have 32-bit memory interfaces.
Strix Halo and many older workstations and many cheaper servers have 256-bit memory interfaces.
High-end servers and workstations have 768-bit or 512-bit memory interfaces.
It is expected that future high-end servers will have 1024-bit memory interfaces per socket.
GPUs with private memory have usually memory interfaces between 192-bit and 1024-bit, but newer consumer GPUs have usually narrower memory interfaces than older consumer GPUs, to reduce cost. The narrower memory interface is compensated by faster memories, so the available bandwidth in consumer GPUs has been increased much slower than the increase in GDDR memory speed would have allowed.
by adrian_b
4/19/2026 at 7:33:21 PM
>now the majority of desktops with DDR5 have 4 channels, not 2 channelsSource? I just looked up two random X870E boards from Gigabyte and both are dual channel.
>To avoid ambiguities, one should always write the width of the memory interface.
They're incomparable quantities. More channels support more parallel operations, while a wider bus at a constant frequency supports higher throughput.
The bus width is not even that useful of a metric. It's more useful to talk about bits per second, which is the product of bus width and frequency.
by fluoridation
4/19/2026 at 6:59:33 AM
4 DIMMS =/= 4 channelsby phonon
4/19/2026 at 2:51:13 PM
I knew that, but I still thought most desktops with 4 dimm slots supported quad-channel memory. I guess I was wrong.by fwipsy