2/10/2026 at 1:07:01 PM
I use the open source Handy [1] app with Parakeet V3 for STT when talking to coding agents and I’ve yet to see anything that beats this setup in terms of speed/accuracy. I get near instant transcription, and the slight accuracy drop is immaterial when talking to AIs that can “read between the lines”.I tried incorporating this Voxtral C implementation into Handy but got very slow transcriptions on my M1 Max MacBook 64GB.
[1] https://github.com/cjpais/Handy
I’ll have to try the other implementations mentioned here.
by d4rkp4ttern
2/12/2026 at 12:33:08 PM
Have you tried Hex?https://github.com/kitlangton/Hex
Faster than handy and uses way less memory.
by t0md4n
2/12/2026 at 2:38:43 PM
Indeed it's extremely fast, now my go-to for STT on MacOS. I made a PR to allow single-tap toggle hotkey instead of double-tap. Unlike Handy which aims to be multi-platform, Hex is MacOS-native and leverages the CoreML + Apple Neural Engine for far speedier transcription.by d4rkp4ttern
2/12/2026 at 12:39:40 PM
Nice, will try, thanks!by d4rkp4ttern
2/10/2026 at 7:41:27 PM
Handy is great but I wish the STT was realtime instead of batchby thethimble
2/11/2026 at 12:47:24 AM
There’s a tradeoff here. If you want streaming output, then you lose the opportunity to clean it up in post processing such as removing filler words or removing stutters, etc., or any other AI based cleanup.The MacOS built-in dictation streams in real time and also does some cleanup, but it does awkward things, like the streaming text shows up at the bottom of the screen. Also I don’t think it’s as accurate as Parakeet V3, and there’s a start up lag of 1-2 secs after hitting the dictation shortcut, which kills it for me.
by d4rkp4ttern
2/11/2026 at 6:12:40 PM
I feel like this is a solvable problem. If you emit an errant word that should be replaced, why not correspondingly emit backspaces to just rewrite the word?I feel like this is the best of both worlds.
Perhaps a little janky with backspaces, but still technically feasible.
by thethimble