1-Bit Bonsai Image 4B Image Generation for Local Devices

5/31/2026 at 5:45:43 PM

Genuine question: is this solving a real problem?

IME, the bottleneck when using diffusion models isn't storage space or memory, it's generation time. Lots of models will run on 8-12 GB 1080-generation GPUs onwards, or on Macs with similar memory, which are probably the bottom end from a GPU power perspective anyway. I also note that these models are marginally slower than the small FLUX.2 model they're based on.

Okay, maybe this allows running a local model on something that has a reasonably powerful GPU and limited memory, like an iPhone, but is that really a common requirement?

by mft_

5/31/2026 at 6:09:31 PM

It's useful progress. Decent-fidelity local-scale inference means that you can create a product that generates throwaway images frequently without worrying about cost. Thus far every product I've seen that generates images is metered, which severely limits the value. I don't know if this is actually at the "decent fidelity" point yet.

by soerxpso

5/31/2026 at 6:23:11 PM

ideally if ternary models work, the math is extremely easy for computers (addition/subtraction vs 16 bit multiplication)

by c0rruptbytes

5/31/2026 at 6:10:55 PM

For free users, I guess local generation is going to be faster than waiting in a queue.

by wmf

5/31/2026 at 6:12:22 PM

Genuine question: doesn't it blow your mind that there exists a 1 Gigabyte file/program that can generate any image you can think of just from a rough description of it?

by moralestapia

5/31/2026 at 6:20:14 PM

Yeah, it's pretty incredible. And I guess that's mostly what's behind the question: whether this is more of an impressive research/technique demonstrator, or a real product advancement solving a need.

by mft_

5/31/2026 at 6:24:21 PM

[dead]

by ArchieScrivener

5/31/2026 at 6:16:02 PM

> doesn't it blow your mind that there exists a 1 Gigabyte file/program that can generate any image you can think of just from a rough description of it?

I can make this into a 5-lines Python program. I’m not saying the images will match the description, but that isn’t part of your spec ;)

by hk__2

5/31/2026 at 4:27:23 PM

I actually can’t wait for the future where I upgrade hardware in order to upgrade my ai as an alternative to an expensive subscription.

There are many problems I want to work on which require billions of tokens. These are completely inaccessible without corporate project sponsorship at the moment. An asic generation machine which can pump out a few 10s of thousands of tokens per second at opus4.6 quality is more than sufficient.

by lumost

5/31/2026 at 6:01:50 PM

A company called Taalas is working on something like that. Not Opus4.6 quality, but I'm sure they're targeting larger models. Currently they're using a LLama 8B model. It runs at ~17k tokens per second, and you can test it at https://chatjimmy.ai/.

by barnas2

5/31/2026 at 5:19:42 PM

I'm curious how hardware and power cost would stack up to subscription cost

by neals

5/31/2026 at 6:31:23 PM

I did an estimate of that if you're interested: https://x.com/pwnies/status/2028831699736637912

The TL;DR though is that a 10-15b param model baked into an ASIC with the latest fab tech would take around 62W of power draw when active. At ~10k+ t/s though it likely would only be active for short bursts of time. It'd fit perfectly fine within the thermal envelope of a laptop.

The approach makes a lot of sense. Once you get to those speeds, latency of the network becomes one of the bigger bottlenecks, so local has a real advantage over a subscription.

by jjcm

5/31/2026 at 4:56:11 PM

Can you give an example of such a problem?

by bigmadshoe

5/31/2026 at 5:37:53 PM

To our knowledge, Bonsai Image 4B is the first image model in its parameter class to run directly on an iPhone.

Isn't SD XL 3.5B? And the refiner model is even larger. Those can run on an iPhone 13 Pro.

by smallerize

5/31/2026 at 3:38:03 PM

They call it a diffusion model, but it's based on Flux.2 which is a rectified flow model.

by sorenjan

5/31/2026 at 6:01:12 PM

Just a side note, that this website is classified by Apple as an Adult website. I have Limit Adult Websites set in Content & Privacy Restrictions switched on.

Led me to wonder what happens if a domain gets a new owner, and they want to petition Apple to remove the block.

by junto

5/31/2026 at 5:31:41 PM

Couldn't try it because the demo app is iOS only and the web version just crashes my browser. The small model is impressive but if you front load a 1.8GB text encoder model, the savings aren't quite as useful.

I do wonder how these compare to existing image generation models. I've tried https://github.com/alichherawalla/off-grid-mobile-ai for a while but I find the image generation models rather lacking.

by jeroenhd

5/31/2026 at 4:38:19 PM

Anyone could pickup the minimal hardware requirements for this? Like both RAM and Storage?

by a1o

5/31/2026 at 5:13:06 PM

The white paper says "mean-active memory pressure down to 1.95 GB for 1-bit Bonsai Image 4B and 2.38 GB for Ternary Bonsai Image 4B". Storage is on the linked page, and is about half that.

by mkl

5/31/2026 at 5:30:51 PM

That is very low, looks like it should run in base MacMini M4 with 16GB RAM. I understand it is not released yet? What sort of harness is necessary for this type of model? (I have only used coding agents through GH Copilot in VS Code, the JetBrains AI tool and Pi, this last one was sort of a pain to setup…)

by a1o

5/31/2026 at 5:44:38 PM

For ternary mlx, size on disk is 3.8GB. 512x512 peak memory use is ~3.7

by tcarambat1010

5/31/2026 at 4:40:10 PM

Is there a benchmark of local image generation models? Local = can run on a 16 GB MacBook or 8 GB+ NVIDIA card.

by wiradikusuma

5/31/2026 at 5:35:00 PM

what trade off would one need to clear to justify the hardware and the work to get this running locally as part of a broader system? It’s a lot of work setting up and maintaining a production harness/system on a local device. I don’t personally repeatedly generate images at a scale where using a lab’s app somehow burns all my tokens. I like the ideas of local ai but I don’t see widespread adoption of it happening in commercial or customer situations anytime soon no matter how little/good enough they get. Even Uber- token burn whiplash but I doubt their answer will be “run some of it local”. IT nightmare, I’d imagine.

by captainregex

5/31/2026 at 4:58:44 PM

I wonder why they didn't use a Bonsai model as the text encoder

by potatoman22

5/31/2026 at 3:38:53 PM

Lately I've noticed posts with barely 10 points getting to HN frontpage. Was it always like this?

by MitPitt

5/31/2026 at 4:11:51 PM

I believe it's the way the HN algorithm works. In order to give new and obscure posts a shot, it will add them to peoples feeds in their front page and see how they measure. Otherwise new posts wouldn't get seen and the flywheel would never get started.

So everyone acts as a sort of beta tester for obscure posts.

by robbomacrae

5/31/2026 at 3:45:11 PM

On weekends, yes. During the week, that’s also true if they arrive within a short time frame, e.g., three minutes. Almost no one looks at “New”. That is the real issue.

by s-macke

5/31/2026 at 5:03:10 PM

Maybe the algorithm has some kind of "momentum" to it, taking into consideration the velocity of upvotes.

by blurbleblurble

5/31/2026 at 3:44:33 PM

Not as much competition on the weekend?

by DannyPage

5/31/2026 at 4:56:10 PM

If you are looking to see the "true" HN frontpage (i.e. most upvoted posts), I'd recommend using https://hckrnews.com

by nickvec

5/31/2026 at 3:43:10 PM

I just assume bots

by Aboutplants

5/31/2026 at 3:48:46 PM

Bots doing what? How would the poster being a bot influence why the post itself makes it to the front page with just 10 points?

by iamjackg

5/31/2026 at 4:50:37 PM

It’s about how quickly they get those points. It doesn’t have to be bots. Sending a post to friends with reputable human profiles, and asking for a vote kinda works of most social networks. Some social networks claim they have protection against this but I wouldn’t bet they catch everything.

by speedgoose

5/31/2026 at 5:05:48 PM

Very interested to see where this kind of work goes for on-device video generation!

by sudb

5/31/2026 at 5:38:10 PM

The text encoder is still 4-bit quantized.

by woadwarrior01

5/31/2026 at 6:11:23 PM

This is why I don't think the big AI companies and nvidia will dominate the market. AIs will just run locally, on whatever hardware you have. Perhaps that's why they worked on this yet-to-be-defined partnership with ARM.

by moralestapia

5/31/2026 at 4:56:04 PM

I was expecting to see images of Bonsai trees when I clicked this

by janniks

5/31/2026 at 4:57:32 PM

I expected a small tree in black and white pixel art.

by tobr

5/31/2026 at 5:30:16 PM

Does anyone ever get their stuff to actually work. Like actually load?

by iJohnDoe

5/31/2026 at 5:34:32 PM

The online demos require WebGPU so Firefox on mobilr and privacy enhanced browsers will break. WebGPU support on Linux and other open source systems is also trash, you can force it to work in Chrome but it won't be happy.

by jeroenhd

5/31/2026 at 4:38:40 PM

Question,

Is it compatible with Ollama, ComfyUI or are those providers unneeded, compatible with low-end hardware?

Also, where does "./setup.sh/ drop the components in Linux?

Thank you, Sol

by SilentM68

5/31/2026 at 3:52:08 PM

impressive, combines a couple techniques that I always wanted the frontier models to have

having trouble loading the webgl browser demo on my phone but no biggy

by yieldcrv