12/26/2025 at 10:09:28 AM
Having the ability to do real-time video generation on a single workstation GPU is mind blowing.I'm currently hosting a video generation website, also on a single GPU (with a queue), which is also something I didn't even think possible a few years ago (my show HN from earlier today, coincidentally: https://news.ycombinator.com/item?id=46388819). Interesting times.
by mishu2
12/26/2025 at 11:15:22 AM
Computer games have been doing it for decades already.by iberator
12/26/2025 at 4:24:34 PM
I think video-based world models like Genie 2 will happen and that they'll be shrunken down for consumer hardware (the only place they're practical).They'll have player input controls, obviously, but they'll also be fed ControlNets for things like level layout, enemy placement, and game loop events. This will make them highly controllable and persistent.
When that happens, and when it gets good, it'll take over as the dominant type of game "engine".
by echelon
12/26/2025 at 6:36:07 PM
I don't know how much they can be shrunk down for consumer hardware right now (though I'm hopeful), but in the near-term it'll probably all be done in the cloud and streamed as it is now. People are playing streamed video games and eating the lag, so they'll probably do it for this too, for now.by qingcharles
12/26/2025 at 7:53:44 PM
This is also the VR killer app.by ragequittah
12/26/2025 at 9:41:35 PM
Are you sure it's not just polish on the porn that is already the "VR killer app"?by cess11
12/26/2025 at 12:16:24 PM
A very, very different mechanism that "just" displays the scene as the author explictly and manually drew it, and yet has to pull an ungodly amount of hacks to make that viable and fast enough, resulting in a far from realistic rendition...This on the other hand happily pretends to match any kind of realism requested like a skilled painter would, with the tradeoff mainly being control and artistic errors.
by arghwhat
12/26/2025 at 4:26:36 PM
> with the tradeoff mainly being control and artistic errors.For now. We're not even a decade in with this tech, and look how far we've come in the last year alone with Veo 3, Sora 2, and Kling 4x, and Kling O1. Not to mention the editing models like Qwen Edit and Nano Banana!
This is going to be serious tech soon.
I think vision is easier than "intelligence". In essence, we solved it in closed form sixty years ago.
We have many formulations of algorithms and pipelines. Not just for the real physics, but also tons of different hacks to account for hardware limitations.
We understand optics in a way we don't understand intelligence.
Furthermore, evolution keeps evolving vision over and over. It's fast and highly detailed. It must be correspondingly simple.
We're going to optimize the shit out of this. In a decade we'll probably have perfectly consistent Holodecks.
by echelon
12/27/2025 at 3:05:55 PM
Hmmm, future video's might just "compress" down to a common AI model and a bunch of prompts + metadata about scene order. ;)by justinclift
12/27/2025 at 11:00:04 AM
I feel like this misses the point. Also, vision and image generation are entirely different things. Even for humans, with some people not being able to create images in their head despite having perfectly good vision.Understanding optics instead of intelligence speaks to the traditional render workflow, a pure simulation of input data with no "creative processes". Either the massive hack that is traditional game render pipelines, or proper light simulation. We'll probably eventually get to the point where we can have full-scene, real-time ray-tracing.
The AI image generation approach is the "intelligence" approach where you throw all optics, physics and render knowledge up in the air and let the model "paint" according to how it imagines the scene, like handing a pencil to a cartoon/anime artist. Zero simulation, zero physics, zero roles - just the imagination of a black box.
No light, physics or existing render pipeline tricks are relevant. If that's what you want, you're looking for entirely new tricks: Tricks to ensure object permanence, attention to detail (no variable finger counts), and inference performance. Even if we have it running in real-time, giving up your control and definition of consistency is part of the deal when you hand off the role of artist to the box.
If you want AI in the simulation approach you'll be taking an entirely different path, skipping any involvement in rendering/image creation and instead just letting the model pupetteer the scene within some physics restraints. Makes for cool games, but completely unrelated to the technology being discussed.
by arghwhat
12/26/2025 at 12:03:34 PM
Bob Ross did it, too.by nkmnz
12/26/2025 at 2:04:35 PM
1 frame of Bob Ross = 1,800sby pwython
12/26/2025 at 7:31:54 PM
So with 108,000 (60 X 1,800) Bob Ross PPUs (parallel painting units) we should be able to achieve a stable 60FPS!by ash_091
12/27/2025 at 12:19:10 AM
Once you set up a pipeline, sure. They'd need a lot of bandwidth to ensure the combined output makes any kind of sense, not unlike the GPU I guess.Otherwise it's similar to the way nine women can make a baby in a month. :)
by mishu2
12/27/2025 at 3:07:32 PM
The food/housing/etc bill for 108k Bob Ross er... PPU's seems like it would be fairly substantial too.by justinclift