Indexing a year of video locally on a 2021 MacBook with Gemma4-31B (50GB swap)

5/21/2026 at 3:20:34 PM

> The skill is open at ~/.claude/skills/video-index/. If you're working on something similar (indexing personal archives, getting a local model to do real archival work, building agents that drive editing tools), I'd be glad to compare notes.

When your Claude wrote this post they might not have selected the right URL to share, unless your home folder is exposed. Care to share the skill files?

by desro

5/21/2026 at 4:15:12 PM

We just got a modern example of the classic message from a friend who just picked up programming, containing: "I just created my own web app, wanna check it out? It's here: http://localhost:8080"

by embedding-shape

5/21/2026 at 3:21:46 PM

Oops! My bad. Fixing it now. And yeah, I can share the Skill file. Give me 5 mins.

by asenna

5/21/2026 at 3:52:35 PM

Ok I scrambled to finalize a name for it and create a new repo for it - https://github.com/Simbastack-hq/framedex

PS - I just put this together in the last few mins, removed my personal files and references. So it's not tested properly, please let me know if any issues.

It's still an early hack, but I have thousands of still images as well from my camera which I've not processed and I need to do the same analysis for those.

So I'll continue working on it, but happy to receive any PRs if anyone finds any use for it.

I'm tired of having a backlog of thousands of images and videos, leaving it for later.

by asenna

5/21/2026 at 4:50:56 PM

Hey friend, try something in this ballpark, your post has a bunch of painful AI tropes:

https://github.com/blader/humanizer

You get a pass here because you're doing really cool stuff but it's kinda tough to read past the AI nonsense, and it's relatively easy to screen out "it's not x it's y" kind of things and the bolded bullet points.

by jaggederest

5/21/2026 at 5:34:51 PM

I don't dislike those tropes because they are frequent or because they are not pleasing to read intrinsically. I dislike them because it tells me it was made by AI and AI output varies strongly in quality and most of it is low on insight but rings the right bells to make it seem insightful. It indicates a lack of human care.

Hiding these clues by another AI pass doesn't solve the core problem. Now you just end up with content that camouflaged better but is still equally low in nutritional value.

by bonoboTP

5/21/2026 at 8:24:28 PM

I feel like human copywriters have been using those same tricks for clickbait articles for years…

by cortesoft

5/21/2026 at 7:36:06 PM

As someone that naturally used a rule of 3 and em dashes I hate AI for taking that away from me.

by yellow_postit

5/21/2026 at 7:42:43 PM

Agreed, I find myself avoiding constructs I would use naturally because they read as AI - "not just because other people would judge them, but because I also notice and dislike them".

by jaggederest

5/21/2026 at 4:56:28 PM

Thanks for this! This is exactly what I was looking for.

Tbh, I have a lot of thoughts and ideas and things to share and I do spend time and effort trying to de-AI-ing it but this should help a lot.

I'll try it out.

In fact, I was expecting getting shit on by HN readers for this but was pleasantly surprised that readers moved past it.

by asenna

5/21/2026 at 5:22:20 PM

Yeah I think you'll find these days that there's a lot of respect for substance like what you're doing, even past the noise of the AI. I also use a lot of AI but you really have to demand quality from it, whether it's writing, media, or code. It's clear you've got the taste from your media work, and we're all still learning as we go, so I'm very glad that I could point you in that direction.

by jaggederest

5/21/2026 at 5:24:17 PM

I'm curious: how, exactly, did it go from this is painful to read due to AI, to no one cares about AI use and you demanded quality when you used it and delivered?

by refulgentis

5/21/2026 at 5:25:27 PM

It didn't, it went from "this reeks from AI after edits, here's a tool that can help" to "people can read past it but there are better ways, you must demand quality". I don't think those two things are inconsistent.

by jaggederest

5/21/2026 at 5:26:44 PM

Ah, I see, after he uses the tool it'll be great because he has taste.

by refulgentis

5/21/2026 at 6:41:29 PM

I think you missed an important distinction being made:

> I also use a lot of AI but you really have to demand quality from it, whether it's writing, media, or code. It's clear you've got the taste from your media work, and we're all still learning as we go...

Their use of AI for "media work" has shown a taste but their writing usage still needs to equal that.

by AlecSchueler

5/21/2026 at 5:28:24 PM

I don't think "if you iterate on this, try using some tools, and ultimately demand that the output meet or exceed your demonstrated taste in other domains" is a hot take, honestly.

by jaggederest

5/21/2026 at 5:31:12 PM

It's not a hot take, you're right, I gravely misunderstood the timing in your post, i.e. you were clearly framing it as after and being polite and encouraging.

I'm more hot about it because it's frustrating having so many HN posts be a place for people to work out first drafts, especially when the first piece of feedback is "hey, uh, you clearly used AI and it's horrible to read as a result." So easy to avoid...good on you for being kinder.

(part of my frustration is I was excited because I write an local LLM client and thought I missed Gemma 4 has streaming video input support, but after reading through the slop it turns out its just the ol' "extract frames" workflow. tbf that would have happened AI or not, but put me in a mood)

by refulgentis

5/21/2026 at 6:00:24 PM

No worries, text is hard whether there's AI involved or not - I, in turn, mistook your clarification as a snarky "ah well of course if they try harder it'll be fine", my apologies for that. I share your frustration, but the best way I think is to educate not remonstrate unless they're someone who should clearly know better[1]

[1] https://news.ycombinator.com/item?id=48172536

by jaggederest

5/21/2026 at 6:01:13 PM

if you care for some feedback about the writing, dropping the link and saying "PR's are open!" would land probably equal or better, and would reduce noise on the message. as sibling said, substance and noise

by repparw

5/21/2026 at 7:43:31 PM

That's actually a really good point, blog posts as open source

by jaggederest

5/21/2026 at 5:23:25 PM

They haven't: this is the top thread, and the entire threads is saying its unreadable and explaining step by step how to do the basics you should have done before you posted. I'm not sure why you're pleasantly surprised, I would have expected embarrassed, and taken down the HN post to get at least the basics down before sharing it under my name (if possible, dunno how HN submissions work)

by refulgentis

5/21/2026 at 5:35:43 PM

Unfortunately will have to disappoint you, can't get embarrassed easily. In fact when all of this worked well locally, felt pretty proud ngl.

by asenna

5/21/2026 at 5:26:25 PM

Btw I like your article, it does feel a bit AI generated but I think the problem and setting are interesting enough that it was a pleasant read.

by Zababa

5/21/2026 at 8:26:18 PM

I'm not quite sure why all that swapping is necessary. I really does age your SSD quite fast. Gemma 4 31B at 4-bit quantization should only be around 19 GiB [1], not 28.4 GiB. I'm not feeding it images regularly, so I'm not sure how much memory it needs to get those into context, but I can't imagine it is more than 10 GiB.

The activity monitor does show all kinds of Electron apps active, on top of a presumably model-loaded Handy and a virtual machine for Claude Code, so I guess that's the real root cause for all the swapping. If your laptop starts trashing I can't imagine you have any use for those app, which will grind to a halt.

[1] https://huggingface.co/mlx-community/gemma-4-31b-it-4bit

by Confiks

5/21/2026 at 3:55:23 PM

UPDATE: Quickly created a repo for this - https://github.com/Simbastack-hq/framedex (MIT License)

It's not tested properly after I genericized it. Will try to go through it properly and add more updates.

Two big things on my TODO: 1) Make use of this indexing and using Claude's help, make video editing faster with Davinci Resolve (now that I have a good index of all the content)

2) I currently did this for videos, but I want to add more things to this for my thousands of still images of my camera - need to make sense of them. So I'll be working on this as well.

by asenna

5/21/2026 at 7:44:17 PM

[flagged]

by oceanus

5/21/2026 at 8:01:50 PM

Could you please not post generated comments to HN? It's not allowed here. See https://news.ycombinator.com/newsguidelines.html#generated and https://news.ycombinator.com/item?id=47340079.

We ban accounts that do this and I don't want to ban you, so please write everything that you post to HN by hand.

Of course, it's impossible to know for sure what was LLM processed or not, but we're getting complaints about some of your posts and, upon inspection, the complaints seem justified.

by dang

5/21/2026 at 8:11:26 PM

This sounds like a great capability to be added to immich

by clueless

5/21/2026 at 8:28:31 PM

Or Stash lol

by asixicle

5/21/2026 at 3:10:38 PM

I ran Gemma on a 2015 thinkpad to do something similar. Fortunately, I could upgrade the memory otherwise it would have been a painful exercise.

Not gonna lie, llama.cpp had the fans spinning at max speed. But it worked and I got the job done.

by throwa356262

5/21/2026 at 5:13:58 PM

> the fans spinning at max speed

This always confuses me - don't people want their computations to run as fast as possible and thus inevitably produce more heat that needs to be vented?

I suppose sometimes it is just an analogy for "its utilizing 100% of my resources" (which I'm guessing it is here), but I've definitely had people say it as an actual complaint in different contexts

by iMerNibor

5/21/2026 at 6:42:11 PM

What people complain is when they visit a blog with two images and the fans are spinning at max speed because the blog has 100 trackers.

by dist-epoch

5/21/2026 at 6:09:15 PM

Fans shouldn't be running at max speed if the model fits in RAM with room to spare for context. Usually fans max out when the model doesn't fit and the CPU is chugging to make up the difference (or the user didn't tune LLM settings)

by 0xbadcafebee

5/21/2026 at 3:53:30 PM

Two questions:

1. What is the search index?

2. The "description.md" example has things like "faces -> cluster_id". Is this from Davinci Resolve's face index? Things like faces+names and locations are really important with photo collections, but general LLMs don't handle them so well.

by herf

5/21/2026 at 4:08:56 PM

1) It's just simple plain-text `.description.md` sidecar files, one per clip, sitting next to each video.

Something which I can query later - Like when brainstorming with Claude "I wanna make some videos of the Luxury rooms in the lodge" and it knows what all videos could help here (going through the files).

There's also a folder root level files that aggregates the text descriptions to make it easier to find.

I've just attached an image in the blog showing an example - https://blog.simbastack.com/_media/gvcycx2n.png

2) No - nothing from DaVinci Resolve. Framedex is a standalone pipeline. Resolve isn't involved.

Faces come from insightface (the open-source buffalo_l pack - RetinaFace for detection), running locally on CPU. For each clip it detects faces in the sampled frames, embeds them, and writes rows to ~/.framedex/faces.db.

Tbh, this part I know it's building up in my local DB but I haven't tested how good is it. Will check them out properly soon.

But yeah, on your broader point that's why framedex deliberately does not ask the LLM to handle faces or locations.

----

Faces → insightface / ArcFace embeddings. Deterministic, comparable across clips. The vision model only contributes a rough people_count; it never tries to identify anyone.

Locations → EXIF GPS via exiftool, reverse-geocoded through Nominatim/OpenStreetMap. Hard metadata, not a guess.

The LLM only does what it's good at: scene description, mood, shot type, keywords, keep/review/cull rating (this last part is also debatable though).

by asenna

5/21/2026 at 3:15:33 PM

> generative AI video has no place on a real travel brand

I am pretty sure that the vast majority of Airbnb hosts would not agree with you.

> equals TripAdvisor crucifixion

I have no idea how the Airbnb hosts with fake listings survive, really.

by egorfine

5/21/2026 at 3:17:29 PM

Haha. It's honestly something that I've been struggling with myself. I'm running this safari lodge but I don't want to go down that route of slop videos!

But on the other hand, genuine videos do take time and slows down the process.

by asenna

5/21/2026 at 5:04:20 PM

My take is that B2C AI applications are kind of structurally limited by how hard it is to build personalized context.

The idea of capable local models could be a huge unlock here if they are able to do the bottom-up context collection research / tagging / etc. at scale.

by theodorewiles

5/21/2026 at 6:29:45 PM

I made a B2C AI app that's fully local (and free) to do AI based contextual file renaming.

So if you give it a bunch of screenshots it will try and intelligently name them based upon what is in the screenshot. Same for videos, PDFs, etc.

But to your point I haven't even tried charging money as it feels like something Apple is just going to bake in as a feature.

https://finalfinalreallyfinaluntitleddocumentv3.com/

by michaelbuckbee

5/21/2026 at 5:38:54 PM

Definitely agree with this. Here, me and Claude brainstorming together did that Research, and some trial-and-error to get to this.

But I can tell it's only a matter of time before agents become smart enough to let my non-tech friends be able to just say "Make sense of all these videos in my folder" and it just does it.

by asenna

5/21/2026 at 5:09:54 PM

Is it really local models that unlock this? Surely stateless model APIs would yield the same benefits? I get that local can be “cheaper” depending on usage, but we’ve been renting storage and compute from clouds at a premium for ages..

by enos_feedler

5/21/2026 at 6:00:08 PM

A huge thing here was the massive amount of data that was just processed - I went through about 1TB of files over 24 hours.

Using API to analyze even a subset of this would've been painful imo.

by asenna

5/21/2026 at 6:27:19 PM

I thought about that in this video case and it's true. I thought the parent comment was making a broader statement about local models in general. But even with video, if it was stored in private cloud storage near the LLM could this still have worked efficiently? What are the most painful elements of this whole setup / work environment if everything was cloud?

by enos_feedler

5/21/2026 at 8:44:35 PM

Oh yes, if everything is cloud, then this is a non-issue.

The few other points of consideration would be:

1) Cost - I was considering using Sonnet for this but there's always the concern of reaching limits OR the API cost if you're using the API.

The feeling of knowing you have a capable model in your hands without any limits is actually pretty awesome. Your mind starts running at what else can I throw at it to do grunt work.

2) Privacy issues - same as with moving to cloud.

3) Reliability issues - I know from experience Claude uptime has been pretty bad the past few months

4) Restrictions - Claude has been pretty heavy handed with their restrictions lately, anything which remotely triggers there flags gets an instant denial (or worse, an account ban). Often these are false-positives.

I love the value I get from Claude but there's a different kind of freedom you get with local, capable models.

by asenna

5/21/2026 at 3:24:15 PM

Thanks for the article! I have a beefy M5 Pro and I'm eagerly looking around for ways to use local models (specifically Gemma4 & Qwen3.6).

This is an excellent thing to do. Especially that LLMs excel at batching thus you can index multiple photos and videos in parallel for no performance penalty.

by egorfine

5/21/2026 at 3:40:59 PM

Unsloth Studio [0] is what I recommend these days, open source alternative to the more widely known LM Studio, and also built by the people who make good quantizations of released models. With MTP support not merged in you should get 2x token generation speed with no accuracy difference. They also have MLX quants if you scroll down a bit, which is a format specifically for macOS' Metal GPU acceleration but that's not integrated into Unsloth Studio just yet.

[0] https://unsloth.ai/docs/models/qwen3.6#mtp-guide

by satvikpendem

5/21/2026 at 3:56:53 PM

I have researched for quite a bit and so far the fastest runtime is the oMLX one. But there's a caveat: ttft on MLX on M4 Pro is enormous. On M5 Pro it has been greatly sped up.

by egorfine

5/21/2026 at 5:03:32 PM

Curious if you tested llama.cpp and still found oMLX faster? I haven't tried the latter myself, might give it a go.

by regexorcist

5/21/2026 at 5:12:37 PM

Oh yeah I did test various solutions and different settings and quants

Llama is about 1/3 slower on Apple Silicon.

by egorfine

5/21/2026 at 4:34:29 PM

I tried Unsloth Studio recently and was disappointed - in particular the downloading functionality is half-baked and didn’t cope with resuming downloads. As it seemed to just be a simple wrapper over llama.cpp, I found that huggingface hub, llama.cpp, and a couple of simple scripts actually offered better functionality once it was set up.

by mft_

5/21/2026 at 3:30:54 PM

I have been contemplating a M5 Pro MBP, but for the life for me I wasn't able to find benchmarks for real-world models, do you happen to know how many tokens per second roughly you get with MoE models like Qwen 3.6 35B/A3B or Gemma 4 26B?

by busfahrer

5/21/2026 at 3:39:18 PM

I'm not normally one to share videos as answers, but this particular fellow does a LOT of work with local AIs and Macs and happens to have a nuanced answer. https://youtu.be/XGe7ldwFLSE

by ahknight

5/21/2026 at 4:32:54 PM

You need to ask macOS people for their prefill speed as well, there are two numbers you care about here, and current MacBooks have generally terrible numbers when it comes to prefill performance. Surely it'll get better with time, but if you already have a desktop, I'd go the "beefy GPU" route first.

by embedding-shape

5/21/2026 at 3:55:42 PM

Qwen 3.6 35B running on oMLX 0.3.9rc1: on oMLX I get 86 t/s on Q4 and 74 t/s on Q6.

Bear in mind that ttft on MLX is much much faster on M5 Pro as compared to M4 Pro.

Also bear in mind that those figures are with NO optimizations whatsoever: no MCP, no DFlash. I am waiting for both to be released for the Qwen models.

by egorfine

5/21/2026 at 7:43:16 PM

Great, thanks! :-) and to mirror another poster: what kind of prompt parsing (prefill) speed do you get for that model? Also how is the speed for the 27B model?

by busfahrer

5/21/2026 at 8:39:41 PM

35B: 1300-1800 t/s on both Q4 and Q6.

27B: give me 20 minutes

by egorfine

5/21/2026 at 4:29:31 PM

I'm running unsloth/Qwen3.6-35B-A3B-UD-Q8_K_XL on an M3 Max, 64GB at ~57 t/s with llama-server

by juancn

5/21/2026 at 4:36:32 PM

Prefill speed and 27B number?

by brcmthrowaway

5/21/2026 at 6:38:30 PM

I’d like to do something like this for the collection of home videos I have piling up, but I’m still on 16GB M1. Any hope of getting decent results with smaller models? If not, does anyone have tips on GPU rental?

I have a Claude max sub and plenty of OpenRouter credit, but I don’t feel good about uploading my family’s private videos

by ngai_aku

5/21/2026 at 3:05:07 PM

Awesome. Say, this is very comprehensive.

I was vaguely aware of all these pieces existing (except for running a facial recognition database at home o_o), but it's really neat to put them all together like that.

by andai

5/21/2026 at 3:16:07 PM

Thanks! I was honestly casually trying it out on the side with Claude's help. And I was actually pleasantly surprised to see how good the result was.

Still blows my mind I can do all this from my 2021 MBP.

I'll try to do a post once I have the next steps working (helping with planning and editing videos with Davinci Resolve).

by asenna

5/21/2026 at 3:44:51 PM

I also have a 64GB M1 Max and am similarly impressed with what that workhorse can do. The M5 tempted me -- a lot -- but then I looked at what I was already getting done on that machine and just couldn't justify it ... yet. Someday, surely, but not yet. Gemma4 gave all my local projects new life, just like what you did here.

Great job. Long live the M1 Max!

by ahknight

5/21/2026 at 4:14:23 PM

100%

Although knowing how good these local models are getting, I am now eyeing the upcoming M5 Ultra Mac Studio (256gigs perhaps). But knowing how crazy the market is, it might be a year before I get the chance to get my hands on it. If it even launches by WWDC.

by asenna

5/21/2026 at 5:37:56 PM

the reason 50GB swap is even viable here is Apple Silicon's memory bandwidth. on x86 that much swap would make inference unusably slow

by cold_harbor

5/21/2026 at 6:12:45 PM

Memory bandwidth or storage bandwidth?

by throwawaytea

5/21/2026 at 5:08:41 PM

Reading this text feels strange, sentences seems to be detached

by gitowiec

5/21/2026 at 7:57:21 PM

I had exactly the same impression, and I recall seeing this style other times recently. First time I thought it was just bad writing skills, now I'm thinking it's AI generated.

by cataphract

5/21/2026 at 6:06:05 PM

Now I have another project for this weekend! I also have tons of video and not a lot of time to index them.

by yardie

5/21/2026 at 4:38:01 PM

So do they run the lodge or what?

by brcmthrowaway

5/21/2026 at 5:58:03 PM

Hi. I wrote this article - yes, I do run a safari lodge in Maasai Mara, Kenya. It's amazing. Ask me anything if you're interested in knowing more.

(Also email is in my profile).

by asenna

5/21/2026 at 5:35:23 PM

The subject matter is interesting but the amount of slop makes it difficult to read through. Yeah, it's great that you can throw your technical problems at Claude without caring much about the generated output but treating your own writing that you actually want to share with the world the same way is a terrible idea.

by zazibar

5/21/2026 at 5:57:07 PM

Tbh, I did spend a lot of time trying to ground it and de-slopify it - verified nothing was halucinated and went through 10 iterations to get to this. It's almost like wrestling with Claude and I knew it would be tough on HN.

But because of the fear of non-perfection, I used to put away things like creating this article or even posting it anywhere. And I do think the article has real value that HN would appreciate (I am myself an HN-enthusiast).

I'll try more. Someone else shared this project which would be really helpful - https://github.com/blader/humanizer

Also a side note, the blog is posted on my self-created Slopit.io platform which is purely meant for your personal agents (working along with you) to post content - I recommend trying it out. https://blog.slopit.io/this-blog-post-is-slop/

I know, things are getting difficult with all the slop around, but my personal opinion is, as the agents get better at writing, the "annoying-ness" factor reduces and pieces of substance will still be appreciated, even if it was written by agents. This and the fact that agents aren't going away.

If I've automated a lot of my coding, I feel like engineers like me would naturally progress to also taking agents' help to write useful content.

PS - this comment was 100% hand-typed.

by asenna

5/21/2026 at 7:01:20 PM

For what it's worth, I really enjoyed this read and almost came here to comment "this is the most enjoyable llm-assisted article I've read in a while"

The tells were unmistakable but it still had a human touch, so I for one am glad you published anyway.

by teach

5/21/2026 at 4:00:55 PM

[flagged]

by maxothex