4/28/2026 at 1:00:09 PM
This is not a new model. Also, it hallucinates a lot. Also, it's very heavy and slow in inference. It's also bad in multilingual.Edit: I'm talking purely about speech to text (STT). Not sure about the other things this can do.
by steinvakt2
4/28/2026 at 5:20:07 PM
It has some perks, is a bit more expressive in some cases, but overall is trained on really noisy data, uses more memory, and isn't that fast - I'm talking about the (7b?) version that they released then removed quickly (vibevoice-community on github) - I still use chatterbox turbo and sometimes qwen TTS.by terbo
4/28/2026 at 1:14:18 PM
Yeah, I don't get why it is suddenly getting so much attention today, it is all over twitter tooby lblock
4/28/2026 at 3:05:11 PM
there is so much more subversive marketing out there than any of us can really fathom. i try not to be too paranoid but it's getting a lot harder every day.i know someone who worked in what we might call the 'astroturfing' space within the entertainment industry. after having a few discussions with him and with things like this[0] becoming more known, it's really difficult to afford any assumption of organic intent when money is on the line - especially at the scale that microsoft works at compared to something as comparatively quaint as the music industry.
[0] https://www.wired.com/story/geese-chaotic-good-marketing-ind...
by GuinansEyebrows
4/28/2026 at 2:46:34 PM
Simonw (who has a bit of a Midas touch for posts here) just posted about it https://simonwillison.net/2026/Apr/27/vibevoice/by xnx
4/28/2026 at 3:36:27 PM
To be fair, his Midas touch is a result of consistency and a lot of hard work.It's like the gardener at one of the Oxford colleges said - it's really easy to create these perfect lawns, just turn up every day and trim and water it - for a couple hundred years.
by realty_geek
4/28/2026 at 6:36:21 PM
I thought they rolled it as well?by soperj
4/28/2026 at 7:48:53 PM
As always with people: listen to what they say, not to what they do...After all, they rarely do what they say themselves, so it's surely not entirely made up nonsense!
by ffsm8
4/28/2026 at 1:24:47 PM
well duh, they updated the news sectionhttps://github.com/microsoft/VibeVoice/commit/e73d1e17c3754f...
which is microsoft for "we removed two dead links". AI innovation knows no limits!
by ramon156
4/28/2026 at 2:30:10 PM
Interestingly that seems to be in response to [1], which might indeed be the trigger for this.[1] https://doublepulsar.com/microsoft-vibing-capturing-screensh...
by Vinnl
4/28/2026 at 7:25:01 PM
Yes, the SOTA is currently much more advanced.by narrationbox
4/29/2026 at 6:22:09 AM
What do you consider to be SOTA?by steinvakt2
4/28/2026 at 3:05:36 PM
It is not good for text to speech (TTS) as well. I am trying it for few days. First of all 1.5B model documentation is not there. 0.5B realtime is shit model. I was converting text, line by line and it was randomly adding music and couldn't handle special characters like "…".I really disappointed with this model to say the least.
by gagan2020
4/30/2026 at 2:50:01 PM
> ...it was randomly adding music...I've been noticing this with the Mistral Voxtral TTS models too. I have my AI record a morning briefing podcast for myself, and occasionally there are sounds like music at the start (the british voice had a musical tone underneath that sounded a little like the end of the BBC News theme). I don't think I've ever encountered that with the OpenAI TTS models, so they're now my default go-to again.
by SyneRyder
4/28/2026 at 7:19:46 PM
yep, it seems this was trained on large amount of podcasts with ad jingles or phone call queues with elevator music. I was also pretty disappointed to run the TTS last week.by tjungblut
4/28/2026 at 6:40:21 PM
The 7B parameter Vibevoice TTS model is still the most impressive local TTS model i've tried. It was pulled by Microsoft a few days after its release due to "abuse potential" but it can be found in various community maintained huggingface repos.by Stagnant
4/28/2026 at 4:36:55 PM
you saved us a lot of time here.... i unstarred the repomoving on....
by zuzululu
4/28/2026 at 5:00:42 PM
I don't really pay attention to stars. Do people use them as bookmarks? Why would you star a repo if you knew so little about it?by Capricorn2481
4/28/2026 at 5:46:47 PM
Stars for me are basically "this might be interesting but I don't have time to look at it now, hopefully I'll think about it later and give it a second look".by drusepth
4/28/2026 at 5:44:22 PM
I exclusively use stars as bookmarks which is why I always found it strange when people talked about lots of stars meaning high quality or trustworthy…I’ve learned since then that I’m probably in the minority (both in using stars as bookmarks and not caring about how many stars a repo has).by einsteinx2
4/28/2026 at 6:06:09 PM
Judging by how many people apparently are paying bots to give their lazily vibe-coded repos thousands of stars, it seems like people both simultaneously take stars seriously while not taking them seriously at all. It breaks my brain.by tombert
4/28/2026 at 3:11:44 PM
You just saved me an afternoon.by scotty79
4/28/2026 at 5:32:57 PM
Saved a lot of my time thanks!by Tamatarr
4/28/2026 at 5:00:32 PM
I'm shocked, shocked to find that Microsoft takes credit for a slow, unoriginal product that doesn't actually do what it advertises.by tombert
4/28/2026 at 5:26:26 PM
Imagine the balls it took to willingly attach the Microsoft label to the front of the product that is Teams.by logicchains
4/28/2026 at 6:30:51 PM
I mean the same can be said about most versions of Windows as well. People act like Windows 11 is where it all went sour, but I've personally kind of hated it since Windows XP.I feel like a recurring pattern with Microsoft is to create something quickly, market it aggressively and push for everyone to use it immediately, and only once it is installed everywhere do people suddenly realize how terrible it is, but it's too late to change.
by tombert
4/28/2026 at 7:05:59 PM
I'm surprised you picked XP as the falling point. I didn't enjoy the days of reinstalling 95/98/ME every 6 months to avoid driver weirdness and seemingly random failures. XP was built on the foundation of 2000, which tended to make it more robust vs. its predecessors.Vista on the other hand...
by NBJack
4/28/2026 at 7:28:38 PM
I mean, part of it is that I really hated the Fisher Price look to it, but it was also the first time I ever felt like I had to "hack" things to make stuff work. I had to muck with registry keys. Oh, and it was the first time that I noticed that Windows repair tools do not work.I suspect I might have hated 9x more but I was pretty young when they came out and I didn't really "get into" computers until XP, and I disliked it enough to dual-boot Linux as a twelve year old.
by tombert
4/28/2026 at 1:21:51 PM
[flagged]by SecretDreams
4/28/2026 at 2:08:39 PM
The nuance is lost on LLM agentic dominant partakers.by NobleLie