2/16/2026 at 6:36:18 AM
My understanding is that this is purely a strategic choice by the bigger labs. When OpenAI released Whisper, it was by far best-in-class, and they haven't released any major upgrades since then. It's been 3.5 years... Whisper is older than ChatGPT.Gemini 3 Pro Preview has superlative audio listening comprehension. If I send it a recording of myself in a car, with me talking, and another passenger talking to the driver, and the radio playing, me in English, the radio in Portuguese, and the driver+passenger in Spanish, Gemini can parse all 4 audio streams as well as other background noises and give a translation for each one, including figuring out which voice belongs to which person, and what everyone's names are (if it's possible to figure that out from the conversation).
I'm sure it would have superlative audio generation capabilities too, if such a feature were enabled.
by Taek
2/16/2026 at 5:53:39 PM
Nvidia released Parakeet which claimed superiority. Doesn't negate your point but I did want to add it.by nickpsecurity