4/12/2026 at 9:59:41 AM
I feel like normalization would be a nightmare. Consider all the mistranscriptions, OCR errors, and different names in the libraries (case, parentheticals, etc).If we assume there's no reliable way to define a book, maybe locally sensitive hashing could help find probably same books.
The idea is pretty cool though.
by pona-a
4/12/2026 at 1:54:10 PM
Good point. Normalization is deliberately scoped to 'what a human reads off the title page' rather than reconciling all possible metadata sources. LSH as a complementary fuzzy-matching layer for catalog reconciliation is exactly what the planned resolver at openusbn.org is designed to support: deterministic identifier as the anchor, probabilistic matching as the discovery tool.by novalis78