3/26/2025 at 6:29:03 PM
I really dig more content about how vector databases/tools handle problems like this!In sqlite-vec, there's only a flat brute-force index (though DiskANN/IVF will be coming soon). But we do have a concept of partition keys[0], which allows you to "internally shard" the vector index based on a user_id or any other value.
create virtual table vec_documents using vec0(
document_id integer primary key,
user_id integer partition key,
contents_embedding float[1024]
)
Then at query time, any WHERE constraints on a partition key are pushed-down, so only matching vectors are searched instead of the entire index. select
document_id,
user_id,
distance
from vec_documents
where contents_embedding match :query
and k = 20
and user_id = 123; -- only search documents from user 123
Definitely not as performant as a proper vector index, but a lot of real-world application have these natural groups anyway. Like "search only English documents" or "search entries in this workspace only", or even "search comments only from the past 30 days."Internally sqlite-vec stores vectors in "chunks", so when partition keys are definds, every chunk is associated with a unique combination of all partition keys. Kinda hard to describe, but if you create a vec0 virtual table and insert some values, you can inspect the internal "shadow tables" in the SQLite database to see how it's all stored.
[0] https://alexgarcia.xyz/sqlite-vec/features/vec0.html#partiti...
by alexgarcia-xyz
3/27/2025 at 2:51:16 AM
Well, I have a brute force strategy for pgvector working reasonably well. Individual, partial indexes. It works for all those queries with category_id=<> clauses. You only need an index for larger categories, for categories with rows below a threshold you dont need index a KNN/dot product would work.by ankitml