alt.hn

5/20/2026 at 3:45:32 PM

Show HN: Lance – image/video generation and understanding in one model

https://github.com/bytedance/Lance

by cleardusk

5/20/2026 at 8:57:08 PM

Video understanding is kind of new, especially if done well, and hopefully working well with UI and UX, that'd be great. Current agents already struggle a bit with 2D space with normal screenshots of unconventional UIs, wonder if this model would do better with actual recordings of navigating and using applications, feels like it could help a bunch with understanding UX at least hopefully. Will be fun to play around with :)

by embedding-shape

5/21/2026 at 12:11:13 AM

What’s SOTA for video understanding? AFAIK most video search is powered by transcription and not the actual video. This seems impressive.

by wxw

5/20/2026 at 7:17:50 PM

Great quality, forked and going to try

by nkvdev

5/20/2026 at 8:03:01 PM

Any plans to port to sglang or vLLM?

by bguberfain

5/21/2026 at 4:17:58 PM

vllm-omni support is on the way : )

by cleardusk

5/20/2026 at 5:06:29 PM

Nice work. Wish they had picked another name given how popular lance/lancedb is.

by Tsarp

5/20/2026 at 6:34:54 PM

Seems like the video output is crippled. Resolution is low (720 or so), as is the frame rate. The samples are shown up-scaled and frame-interpolated.

Why do that? Seems strange to be building sub-hd resolution video models in 2026.

by popalchemist

5/20/2026 at 7:26:28 PM

Sure, but again, it's a micro 3B model. Perhaps it can't be used for general video work, but it might be able to do basic edits like remove an object from a table in a shot.

by jadbox

5/20/2026 at 8:06:31 PM

It’s not a micro model at all, it requires 40gb of VRAM. The 3B is just the active parameters.

by MattRix

5/20/2026 at 8:37:04 PM

[flagged]

by vaporaviatorlab

5/20/2026 at 6:10:59 PM

[flagged]

by CrzyLngPwd

5/20/2026 at 6:48:55 PM

Not that surprising if the reason you have virtually unlimited compute and programming resources is that you work at the leading short form video app company. They could also have chosen nót to open source it.

by menno-sh

5/20/2026 at 7:51:56 PM

Do you find the video understanding work there also to be 'silly little slop', or did you only look at the gifs on the page and not read about the understanding work in a 3B model?

This is not ground-breaking by any means, but achieving this in a 3B model and sharing the approach + weights advances engineering and certainly more contribution that 'silly little slop videos' imo.

by neosat

5/20/2026 at 8:08:05 PM

It’s not a 3B model, it has 3B active parameters. The full model is much larger.

by MattRix

5/20/2026 at 8:38:14 PM

That's true, I should have mentioned active. Actual params are closer to 12B-14B likely, given the 40GB VRAM usage.

by neosat

5/20/2026 at 5:15:07 PM

last dance for lance vance!

by asadm

5/20/2026 at 5:52:43 PM

:D

by cleardusk