alt.hn

3/16/2026 at 1:09:58 PM

My Journey to a reliable and enjoyable locally hosted voice assistant

https://community.home-assistant.io/t/my-journey-to-a-reliable-and-enjoyable-locally-hosted-voice-assistant/944860

by Vaslo

3/16/2026 at 6:39:41 PM

actually the hardest part of a locally hosted voice assistant isn't the llm. it's making the tts tolerable to actually talk to every day.

the core issue is prosody: kokoro and piper are trained on read speech, but conversational responses have shorter breath groups and different stress patterns on function words. that's why numbers, addresses, and hedged phrases sound off even when everything else works.

the fix is training data composition. conversational and read speech have different prosody distributions and models don't generalize across them. for self-hosted, coqui xtts-v2 [1] is worth trying if you want more natural english output than kokoro.

btw i'm lily, cofounder of rime [2]. we're solving this for business voice agents at scale, not really the personal home assistant use case, but the underlying problem is the same.

[1] https://github.com/coqui-ai/TTS [2] https://rime.ai

by ljclifford

3/16/2026 at 7:17:48 PM

80% of my home voice assistant requests really need no response other than an affirmative sound effect.

by cdcarter

3/16/2026 at 7:32:54 PM

100% agree. I dont want a Yes, Got it, Will do or even worse, I have turned on the Bedroom Light. I want soft success ding or a low failure boop.

by nickthegreek

3/16/2026 at 7:31:54 PM

> actually the hardest part of a locally hosted voice assistant isn't the llm. it's making the tts tolerable to actually talk to every day.

I would argue that the hardest part is correctly recognizing that it's being addressed. 98% of my frustration with voice assistants is them not responding when spoken to. The other 2% is realizing I want them to stop talking.

by cptskippy

3/16/2026 at 2:12:47 PM

If you're less concerned about privacy, I use Gemini 2.5 Flash for this and it's exceptionally good and fast as a HA assistant while being much cheaper than the electricity that would be needed to keep a 3090 awake.

The thing that kills this for me (and they even mentioned it) is wake word detection. I have both the HA voice preview and FPH Satellite1 devices, plus have experimented with a few other options like a Raspberry Pi with a conference mic.

Somehow nothing is even 50% good as my Echo devices at picking up the wake word. The assistant itself is far better, but that doesn't matter if it takes 2-3 tries to get it to listen to you. If someone solves this problem with open hardware I'll be immediately buying several.

by hamdingers

3/16/2026 at 3:05:05 PM

How about a button?

I'd prefer to physically press a button on an intercom box than having something churning away constantly processing sound.

by _spduchamp

3/16/2026 at 3:24:15 PM

If I have to go to a thing and push a button, I'd rather the button do the thing I wanted in the first place. Voice assistants are for when my hands are full or I don't want to get up. (I wrote more about my home automation philosophy in another comment[1]).

Also I have all my voice assistant devices mounted to the ceiling

1. https://news.ycombinator.com/item?id=47399909

by hamdingers

3/16/2026 at 5:16:01 PM

The pebble index seems like the optimal form for this.

https://repebble.com/index

Could be pressed even if your hands were busy.

by tomComb

3/16/2026 at 6:08:44 PM

If you want to relax some constraints, I made something similar for $10: https://www.stavros.io/posts/i-made-a-voice-note-taker/

by stavros

3/16/2026 at 6:15:46 PM

Did you have any luck with the power issues on the new board?

by CurleighBraces

3/16/2026 at 6:17:52 PM

The new board hasn't come yet, but a friend gave me a great idea, to power the mic from a GPIO, which powers it off completely when the ESP is off.

Hopefully the new boards will be here soon, but another issue is that I don't really have anything that can measure microamp consumption, so any testing takes days of waiting for the battery to run down :(

I do think these clones are the issue, though. They had a LED I couldn't turn off, so they'd literally shine forever. They don't seem engineered for low quiescent current, so fingers crossed with the new ones.

by stavros

3/16/2026 at 7:15:05 PM

Makes a lot of sense :) thanks for the update.

I'll try to remember to creepy stalk you for updates as the device sounds great!

by CurleighBraces

3/16/2026 at 7:39:48 PM

You can sign up to my mailing list to get emails if you want! It's at the end of each post.

by stavros

3/16/2026 at 3:21:16 PM

I'm in if I can embed it into my forearm

by pwillia7

3/16/2026 at 3:35:15 PM

In the mid 2000s I had a setup where some children's walkie talkie "spy watches" could be used to issue commands to a completely DIY, relay based smart home system.

I'm looking forward to whenever my Pebble ships so I can recreate that experience with this: https://github.com/skylord123/pebble-home-assistant-ws

by hamdingers

3/16/2026 at 5:18:17 PM

apple watch gets you close.

by nickthegreek

3/16/2026 at 5:50:45 PM

Time for a real life Star Trek comm badge

by croes

3/16/2026 at 3:23:42 PM

Rules out a bunch of cases where your hands are busy handling ingredients in the kitchen, etc

by kortilla

3/16/2026 at 4:47:33 PM

What's been surprising in my experience regarding the wake word is that it recognizes me (adult male) saying the wake word ~95% of the time. However, it only registers the rest of my family (women and children) ~30% of the time.

by ethagnawl

3/16/2026 at 5:34:37 PM

I have no firsthand knowledge, but I’d strongly bet that the home-assistant effort to donate training data is mostly get adult males, and nearly zero children.

by vineyardmike

3/16/2026 at 6:48:40 PM

This was 2021 (so pre-llm), but I used to work for a company that gathered data for training voice commands (Alexa, Toyota, Sonos, were some clients). Basically, we paid people to read digital assistant scripts at scale.

Your assumptions about training data do not match the demographics of data I collected. The majority of what our work revolved around was getting diversity into the training data. We specifically recruited kids, older folks, women, people with accented/dialected English and just about every variety of speech that we could get our hands on. The companies we worked with were insanely methodical about ensuring that different people were included.

by dghlsakjg

3/16/2026 at 7:30:34 PM

You are reporting on a deliberately curated effort vs. what I understand is effectively voluntary data donation without incentives. It's not surprising to me that the later dataset ends up biased due to the differences in sourcing.

by gmueckl

3/16/2026 at 5:48:47 PM

Oh, I'm sure you're right. I've had people in my personal life (non-technical; "AI enthusiasts") laugh at me over concerns about training bias but this is likely a real world example of it.

by ethagnawl

3/16/2026 at 6:10:12 PM

I think you can train your own wake word with microWakeWord but I've never done it.

by stavros

3/16/2026 at 2:21:41 PM

I have a feeling beamforming microphone arrays might help here, something like this could improve the audio being processed substantially - https://www.minidsp.com/products/usb-audio-interface/uma-8-m....

by jcims

3/16/2026 at 4:50:10 PM

That's a good call. I have a PS3(?) mic/camera that I was using when I was running the original Mycroft project on a Pi. I wonder if that would help with the inbuilt HA mic not waking for most of my family, most of the time. I will have to look at my VA Preview device and its specs later because I'm not sure if you can connect an external mic to it out-of-the-box.

by ethagnawl

3/16/2026 at 3:39:07 PM

Alexa devices have these (or used to at least), but Google Home's never did. So it shouldn't be necessary.

by IshKebab

3/16/2026 at 4:05:23 PM

Yeah a small (ideally personalized) wakeword model would probably outperform just about any audio wizardry.

by jcims

3/16/2026 at 6:29:44 PM

What about your wifi APs sensing which room you are in, with your choice of hilarious dance moves as the trigger ?

Funky chicken for Gemini

Penguin dance for OpenAI

Claude?

by robotswantdata

3/16/2026 at 3:22:21 PM

Why not use an easier to detect wake “word”, like two claps in quick succession? Or a couple of notes of a melody?

by senkora

3/16/2026 at 3:26:44 PM

Can't clap if your hands are full and I would not subject my family to my attempts at delivering a melody.

I haven't tried training my own wake word though, I'm tempted to see if it improves things.

by hamdingers

3/16/2026 at 7:08:48 PM

What about whistling?

by otikik

3/16/2026 at 4:50:22 PM

Personally I'd pick "Cthulhu"

by airstrike

3/16/2026 at 4:29:29 PM

One that I have been experimenting with is using analog phones (including rotary ones!) to act as the satellites. I live in an older home and have phone jacks in most of the rooms already so I only had to use a single analog telephone adapter. [0] The downside is I don't have wake word support, but it makes it more private and I don't find myself missing my smart speakers that much. At some point I would like to also support other types of calls on the phones, but for now I need to get an LLM hooked up to it.

[0] https://www.home-assistant.io/voice_control/worlds-most-priv...

by tkems

3/16/2026 at 1:47:56 PM

I'm still waiting till the promise of voice AI that was showed during the OpenAI demo in 2024 turn real somehow. It's not clear to me, why there has been zero progress since then.

by yanis_t

3/16/2026 at 3:46:56 PM

What tech can do vs applying it requires it often to be configured and packaged to be usable in that way.

by j45

3/16/2026 at 3:48:25 PM

It also needs to work at least 99% of the time if not more. Not easy to do this with indeterministic models.

by phito

3/16/2026 at 4:34:37 PM

If my lights and heat were 99% reliable, I'd be getting new lights and heat.

by recursive

3/16/2026 at 6:15:02 PM

In those cases yeah, 99% isn't reliable enough. I'm not going to tolerate having power down for 3 days out of the year. But in fairness, home automation is less critical than that so 99% reliability is still acceptable to me. I don't think LLMs are anywhere near that, though, nor is there any sign of them getting there any time soon. So it does concern me to use an LLM as the backbone of home automation.

by bigstrat2003

3/16/2026 at 6:24:54 PM

I took 99% reliable as meaning not having to repeat the command, which given that Siri is something like 50% reliable by that metric, 99% sounds like heaven.

by bombcar