4/1/2025 at 11:04:01 AM
I just tried the demo on the homepage and I don’t know what kind of sorcery this is but it’s blowing my mind.I input a bunch of completely made up words (Quastral Syncing, Zarnix Meshing, HIBAX, Bilxer) and used them in a sentence and the model zero-shotted perfect speech recognition!
It’s so counterintuitive for me that this would work. I would have bet that you have to provide at least one audio sample in order for the model to recognize a word it was never trained on.
Providing it to the model in text modality and it being able to recognize it in the audio modality must be an emergent property.
by gronky_