6/28/2026 at 10:55:24 AM
I tried a few SOTA realtime avatar systems from Chinese labs and the actual quality was far worse than the amazing (cherrypicked) videos on their demo pagesI ran an analysis on hundreds generated videos featuring various races/ethnicities and found that Chinese models are overfitted on East Asian faces (predictable though) and have trouble properly animating many European/most African faces (bad lipsync).
They all had accumulating artifacts over the long term (the video stops being stable after N seconds, for example the image gets more and more washed out)
So I don't have high hopes here, everyone on the demo page is predictably East Asian and the output quality doesn't look better than prior art. I guess the innovation here is that it's end-to-end but we need to see if it's any good. WAN-derived image-audio-to-video systems used to be notoriously slow, here they boast 25 FPS for 192p but it's pretty slow actually, I managed to reach similar FPS for 720p with prior art.
by kgeist