3/31/2026 at 11:39:52 PM
1 bit with a FP16 scale factor every 128 bits. Fascinating that this works so well.I tried a few things with it. Got it driving Cursor, which in itself was impressive - it handled some tool usage. Via cursor I had it generate a few web page tests.
On a monte carlo simulation of pi, it got the logic correct but failed to build an interface to start the test. Requesting changes mostly worked, but left over some symbols which caused things to fail. Required a bit of manual editing.
Tried a Simon Wilson pelican as well - very abstract, not recognizable at all as a bird or a bicycle.
Pictures of the results here: https://x.com/pwnies/status/2039122871604441213
There doesn't seem to be a demo link on their webpage, so here's a llama.cpp running on my local desktop if people want to try it out. I'll keep this running for a couple hours past this post: https://unfarmable-overaffirmatively-euclid.ngrok-free.dev
by jjcm
4/1/2026 at 12:11:49 AM
Thanks for sharing the link to your instance. Was blazing fast in responding. Tried throwing a few things at it with the following results: 1. Generating an R script to take a city and country name and finding it's lat/long and mapping it using ggmaps. Generated a pretty decent script (could be more optimal but impressive for the model size) with warnings about using geojson if possible 2. Generate a latex script to display the gaussian integral equation - generated a (I think) non-standard version using probability distribution functions instead of the general version but still give it points for that. Gave explanations of the formula, parameters as well as instructions on how to compile the script using BASH etc 3. Generate a latex script to display the euler identity equation - this one it nailed.Strongly agree that the knowledge density is impressive for the being a 1-bit model with such a small size and blazing fast response
by najarvg
4/1/2026 at 12:15:24 AM
> Was blazing fast in responding.I should note this is running on an RTX 6000 pro, so it's probably at the max speed you'll get for "consumer" hardware.
by jjcm
4/1/2026 at 2:25:03 AM
consumer hardware?That... pft. Nevermind, I'm just jealous
by ineedasername
4/1/2026 at 2:47:24 AM
Look it was my present to myself after the Figma IPO (worked there 5 years). If you want to feel less jealous, look at the stock price since then.by jjcm
4/1/2026 at 7:15:01 PM
Well in this context it's a 5090 with extra unused memory.by Dylan16807
4/1/2026 at 2:57:43 AM
Holy hell ... that's a monster of a cardby abrookewood
4/1/2026 at 12:17:31 AM
I must add that I also tried out the standard "should I walk or drive to the carwash 100 meters away for washing the car" and it made usual error or suggesting a walk given the distance and health reasons etc. But then this does not claim to be a reasoning model and I did not expect, in the remotest case, for this to be answered correctly. Ever previous generation larger reasoning models struggle with thisby najarvg
4/1/2026 at 12:21:31 AM
I ran it through a rudimentary thinking harness, and it still failed, fwiw: The question is about the best mode of transportation to a car wash located 100 meters away. Since the user is asking for a recommendation, it's important to consider practical factors like distance, time, and convenience.
Walking is the most convenient and eco-friendly option, especially if the car wash is within a short distance. It avoids the need for any transportation and is ideal for quick errands.
Driving is also an option, but it involves the time and effort of starting and stopping the car, parking, and navigating to the location.
Given the proximity of the car wash (100 meters), walking is the most practical and efficient choice. If the user has a preference or if the distance is longer, they can adjust accordingly.
by jjcm
4/1/2026 at 1:42:41 PM
And to be fair, you asked about traveling to a location. It just so happens that location is a car wash. You didn't say anything about wanting to wash the car; that's an inference on your part. A reasonable inference based on human experience, sure, but still an inference. You could just as easily want to go to the car wash because that's where you work, or you are meeting somebody there.by nlaslett
4/1/2026 at 1:31:42 PM
Honestly, the fact that we have models that can coherently reason about this problem at all is a technological miracle. And to have it runnable in a 1.15GB memory footprint? Is insanity.by monarchwadia
4/1/2026 at 5:14:50 PM
Exactly. It's not that the pig dances poorly, or that the dog's stock tips never seem to pan out. It's the fact that it's happening at all.by CamperBob2
4/1/2026 at 6:04:22 PM
But the fact that we have convinced a pig to dance, and trained a dog to provide stock tips? That can be improved upon over time. We've gotten here, haven't we? It really is a miracle, and I'll stick to that opinion.by monarchwadia
3/31/2026 at 11:50:22 PM
here's the google colab link, https://colab.research.google.com/drive/1EzyAaQ2nwDv_1X0jaC5... since the ngrok like likely got ddosed by the number of individuals coming alongby adityashankar
4/1/2026 at 5:26:34 AM
Thanks, that works. I only tested the 1.7B. It has that original GPT3 feel to it. Hallucinates like crazy when it doesn't know something. For something that will fit on a GTX1080, though, it's solid.We're only a couple of years into optimization tech for LLMs. How many other optimizations are we yet to find? Just how small can you make a working LLM that doesn't emit nonsense? With the right math could we have been running LLMs in the 1990s?
by qingcharles
3/31/2026 at 11:55:26 PM
Good call. Right now though traffic is low (1 req per min). With the speed of completion I should be able to handle ~100x that, but if the ngrok link doesn't work defo use the google colab link.by jjcm
3/31/2026 at 11:58:20 PM
The link didn't work for me personally, but that may be a bandwidth issue with me fighting for a connection in the EUby adityashankar
4/1/2026 at 8:01:54 AM
As someone whose brain was addled by exposure to art history, I strongly support the suggested pelican on bicycle.by AnthonBerg
4/1/2026 at 1:52:42 AM
Thanks. Did you need to use Prism's llama.cpp fork to run this?by andai
4/1/2026 at 2:47:32 AM
Yep.by jjcm
4/1/2026 at 3:35:53 AM
Could you elaborate on what you did to get it working? I built it from source, but couldn't get it (the 4B model) to produce coherent English.Sample output below (the model's response to "hi" in the forked llama-cli):
X ( Altern as the from (.. Each. ( the or,./, and, can the Altern for few the as ( (. . ( the You theb,’s, Switch, You entire as other, You can the similar is the, can the You other on, and. Altern. . That, on, and similar, and, similar,, and, or in
by andai
4/1/2026 at 4:28:55 AM
I have older M1 air with 8GB, but still getting ober 23 t/s on 4B model.. and the quality of outputs is on par with top models of similar size.1. Clone their forked repo: `git clone https://github.com/PrismML-Eng/llama.cpp.git`
2. Then (assuming you already have xcode build tools installed):
cd llama.cpp
cmake -B build -DGGML_METAL=ON
cmake --build build --config Release -j$(sysctl -n hw.logicalcpu)
3. Finally, run it with (you can adjust arguments): ./build/bin/llama-server -m ~/Downloads/Bonsai-8B.gguf --port 80 --host 0.0.0.0 --ctx-size 0 --parallel 4 --flash-attn on --no-perf --log-colors on --api-key some_api_key_string
Model was first downloaded from: https://huggingface.co/prism-ml/Bonsai-8B-gguf/tree/main
by freakynit
4/1/2026 at 4:57:30 AM
To the author: why is this taking 4.56GB ? I was expecting this to be under 1GB for 4B model. https://ibb.co/CprTGZ1cAnd this is when Im serving zero prompts.. just loaded the model (using llama-server).
by freakynit
4/1/2026 at 3:46:23 AM
I did this: https://image.non.io/2093de83-97f6-43e1-a95e-3667b6d89b3f.we...Literally just downloaded the model into a folder, opened cursor in that folder, and told it to get it running.
Prompt: The gguf for bonsai 8b are in this local project. Get it up and running so I can chat with it. I don't care through what interface. Just get things going quickly. Run it locally - I have plenty of vram. https://huggingface.co/prism-ml/Bonsai-8B-gguf/tree/main
I had to ask it to increase the context window size to 64k, but other than that it got it running just fine. After that I just told ngrok the port I was serving it on and voila.
by jjcm
4/1/2026 at 1:16:47 AM
I reminds me of very early ChatGPT with mostly correct answers but some nonsense. Given its speed, it might be interesting to run it through a 'thinking' phase where it double checks its answers and/or use search grounding which would make it significantly more useful.by rjh29
4/1/2026 at 12:10:09 AM
The speed is impressive, I wish it could be setup for similar to speculative decodingby uf00lme
4/1/2026 at 2:56:40 AM
man, that is really really quick. What is your desktop setup??? GPU?by abrookewood
4/1/2026 at 3:16:18 AM
It is fast, but I do have good hardware. A few people have asked for my local inference build, so I have an existing guide that mirrors my setup: https://non.io/Local-inference-buildby jjcm
4/1/2026 at 2:52:50 AM
thanks, i tested it, failed in strawberry test. qwen 3.5 0.8B with similar size passes it and is far more usable.by pdyc
4/1/2026 at 7:59:34 PM
I hope you are kidding, how is that a test of any capabilities? it's a miracle that any model can learn strawberry because it cannot see the actual characters and ALSO, it's likely misspelled a lot in the corpus. I've been playing with this model and I'm pleasantly surprised, it certainly knows a lot, quite a lot for 1.1Gby cztomsik
4/1/2026 at 9:09:19 AM
Does asking it to think step by step, or character by character, improves the answer? It might be a tokenization+unawareness of its own tokenization shortcomingsby algoth1
4/1/2026 at 9:56:23 AM
no it did not with character by character it concluded 2 :-)by pdyc
4/1/2026 at 4:12:15 AM
Interesting. Qwen 3.5 0.8B failed the test for me.by selcuka
4/1/2026 at 12:12:07 AM
wow that was cooler than I expected, curious to embed this for some lightweight semantic workflows nowby hmokiguess
4/1/2026 at 12:39:47 AM
[dead]by tristanMatthias