alt.hn

3/6/2026 at 4:35:47 PM

Show HN: LoRA gradients on Apple's Neural Engine at 2.8W

https://github.com/jmanhype/ane-lora-training

by jmanhype

3/6/2026 at 4:43:07 PM

I'm not an ML engineer. I used Claude Code (Opus 4.6) to get LoRA fine-tuning gradients running on Apple's Neural Engine — the dedicated ML chip in every Apple Silicon Mac that has no public training API.

192 gradient dispatches, zero GPU fallbacks, converging loss, all at ~2.8W.

Three discoveries found through iteration on real hardware: (1) ANE's matmul op compiles but never executes — everything must be rewritten as 1x1 convolutions, (2) spatial dimensions must be multiples of 16, (3) the ANE compiler leaks handles and silently fails after ~119 compiles.

Built on maderix's ANE reverse engineering work. The repo includes the full MIL kernel generator, subprocess isolation for the compile limit, and integration with MLX for hybrid GPU+ANE training.

by jmanhype