4/1/2026 at 7:07:01 PM
Although I'm interested in both topics (KV compression and attempts to stream MoE models from storage) this is at least the 10th vibecoded project on this topic I've seen today alone across HN, Twitter, and some subreddits I visit.At least this one gave credit to the upstream projects which it used as a reference.
The llama.cpp project is also getting a wave of vibecoded PRs that are very clearly being produced by pointing claude at the repo and the original paper and having it produce something.
Almost none of these attempts contain information that really matters, like actual benchmark tests with differen KV quantization levels (not just perplexity or KLD).
by Aurornis
4/1/2026 at 8:53:56 PM
The performance gain in the recent Flash-MoE implementations is seemingly obtained mostly by coalescing the data for each single MoE layer-expert into a single sequential extent which can be read efficiently from SSD. If so, this will actually require some changes in the underlying GGUF format; though the GGUF standard provides explicitly for specifying different data layouts, so the additions are arguably minor.As far as the TurboQuant thing goes, it seems that attn-rot has recently been merged in, which is a lightweight variety of it and written by the original llama.cpp author, so not an outside pull req.
by zozbot234
4/1/2026 at 9:52:12 PM
> As far as the TurboQuant thing goes, it seems that attn-rot has recently been merged in, which is a lightweight variety of it and written by the original llama.cpp author, so not an outside pull req.Yes, read the first sentence of the PR for it. The project is a constant target for vibecoded PRs and they're trying to stay in front of it:
> In anticipation of the incoming flood of vibe generated PRs implementing TurboQuant, I'm raising the baseline a bit
by Aurornis
4/1/2026 at 7:15:18 PM
"vibe coded" is NOT the bad thing you think it is.Going from paper to implementation from scratch in half an hour or so is great.
by _zoltan_
4/1/2026 at 7:21:19 PM
> "vibe coded" is NOT the bad thing you think it is.It's not inherently bad in the same way that a first draft of a novel is not inherently bad.
But if someone asked me to read their novel and it was a first draft that they themselves had clearly not bothered reading or editing, I'd tell them to fuck off.
by mjr00
4/1/2026 at 8:08:14 PM
At least in the novel example the author had the decency to write what they're asking you to read.These are more like sending someone who didn't ask you a question a LMGTFY link they didn't ask for and expecting them to read all the results. Just a complete lack of awareness and respect for the maintainers
by sumeno
4/1/2026 at 7:58:16 PM
Sure, but the problem is when you take that half hour of work and share it with other people without making clear how much effort has gone into it.Software is valuable if it has been tested and exercised properly by other people. I don't care if you vide coded it provided you then put the real work in to verify that it actually works correctly - and then include the proof that you've done that when you start widely sharing it with the world.
Right now it's impossible to tell which of these projects implementing the paper are worth spending time with.
by simonw
4/1/2026 at 8:53:03 PM
> without making clear how much effort has gone into itI'm increasingly convinced this is the critical context for sharing LLM outputs with other people. The robots can inflate any old thought into dozens of pages of docs, thousands of lines of MR. That might be great! But it completely severs the connection between the form of a work and the author's assessment/investment/attachment/belief in it. That's something one's audience might like to know!
by kristjansson
4/1/2026 at 8:55:26 PM
Is t the point of an MVP to be an MVP?The OP put together a POC and shared it, showing novel concepts used together. They are not some large R&D lab.
The purist tests being asked for is in contradiction to the ShowHN guidelines.
by dalemhurley
4/1/2026 at 10:17:10 PM
Thanks, we are not large R&D lab, limited resources. We were working on a product with is a Local VLM first BYOD when you want Video Security application, our users requested to have a MLX backend benchmark comparison, we tried hard to not deliver with Python in the application bundle, so we searched for a pure binary based MLX implementation the results shown we need to build one. It took us two weeks to get it working and we had been testing with multiple models. As a reference, you can see the result here: https://www.sharpai.org/benchmark/Then we saw the announcement from Google about TurboQuant, it's so cool, so we started to integrate them (along with SSD/Flash streaming). It's a non-trivial process and thanks for your support and understanding. When we saw the mobile application alive with QWEN 3 1.7B model, we thought it worth.
If we get anything similar with well maintains, we will definitely adopt it since our target is the production delivery, if this one gets good support from the community, we will continue to support.
I think all the posts here gave us a reason to continue.
by aegis_camera
4/1/2026 at 9:37:28 PM
> The OP put together a POC and shared it, showing novel concepts used together.That's the contention: There are countless POCs for these concepts already, and some of them were used as the basis for this project.
It's not really a novel POC, it's the result of putting the previous work into Claude Code and telling it to rewrite it in Swift, then putting your name on it. To be fair, the person did start adding the reference projects to the very end of the README
But if you didn't what to look for, you'd assume this was a very novel project attributable to their own work
by Aurornis
4/1/2026 at 9:16:48 PM
This post wasn't marked as a Show HN.by simonw
4/1/2026 at 10:10:11 PM
Tried, but wrong time to post, it got zero attention . :)by aegis_camera
4/1/2026 at 9:01:30 PM
[dead]by th0ma5
4/1/2026 at 8:15:01 PM
> Going from paper to implementation from scratch in half an hour or so is great.This repo isn’t showing that at all. Scroll to the bottom of the README and you’ll see the other project it was based on. It’s a translation of other people’s work.
There have been dozens or perhaps hundreds of vibecoded TurboQuant examples posted around the usual forums in the past few days. This one doesn’t even include anything helpful like benchmarks or tests. It’s just some proof of concept code that doesn’t even work if you try to run it.
My problem with this specific type of vibe coded project is that it’s initially presented as something more novel or polished in order to get more upvotes, karma, likes, or pad a resume. Then you read it and discover they just pointed Claude at some other projects and told it to produce something similar, then posted it as their own work.
by Aurornis
4/1/2026 at 7:22:56 PM
That’s a starting spot, but how about some testing and benchmarks?Where’s the value added if the person just tells Claude to do it and then submits a PR?
The maintainers may as well vibe code it themselves if that’s all the work the would-be contributor is going to put into it.
by brokencode
4/1/2026 at 7:30:12 PM
if it works it workswe live in a wholly unoptimized world because the available resources have been so high, while the benefits of optimizing have been so low. that has flipped now and there are tons of low hanging fruit to optimize.
I agree that benchmarks would be great, but thats only relevant to this one topic, not the overall agentic coded pull request concept itself
by yieldcrv
4/1/2026 at 7:35:16 PM
It's relevant in that it's an example that people are doing the easy part - the coding - and skipping the hard part - the benchmarking and proving it works and provides value.A PR without evidence it works and expectations for the benefits using the new feature would bring is kind of worthless.
by jmalicki
4/1/2026 at 8:08:51 PM
It might work, but what's the point is sharing it if anyone can do the same in those 30 minutes with minimal effort?by pqtyw
4/1/2026 at 8:04:34 PM
> if it works it worksIf it works in one case that doesn't mean it works consistently or well in the general case
I've made lots of things with Claude Code that just work... until I do things in a slightly different order and the whole thing explodes
by sumeno
4/1/2026 at 7:38:01 PM
The authors of the project have CC as well, so doing this is just eating their time.by sroussey
4/1/2026 at 8:07:31 PM
If there is nothing valuable it contributes, though? i.e. its not a novel paper then only value is the whatever you personally learn from it.by pqtyw