5/17/2026 at 3:37:00 AM
I have always called this the “one true taxonomy” problem, because whenever you sit with multiple stakeholders in a room talking about a taxonomy, you can never get to agreement, because there is no such thing as the “one true taxonomy”.Any hierarchical taxonomy classifies on one dimension at each taxonomic level. Invariably someone wants to classify on one criteria when someone else wants to classify on another. Taxonomies that humans use aren’t multi-dimensional. So if there is a disagreement, someone wins and someone(s) has to lose.
No one is wrong; they just have different priorities or preferences or goals.
So now as an architect I never argue (and seldom discuss) taxonomies. I make two points and then bow out:
1. Whatever your taxonomy is, you need a rubric for each level. You need a procedure or set of questions that unambiguously map any $THING you encounter into exactly one bucket. Validate that competent people with no specific domain knowledge can properly classify things with your rubric; it must be repeatable by amateurs, not just experts (software is dumb).
2. Existence trumps theory. If there exists a taxonomy and rubric for what you’re classifying, you need to provide a $DARN_GOOD_REASON why this wheel needs reinventing. Personal preference and your 1% edge case probably don’t justify all the work to reinvent everything.
Then, I go back to the implementers and tell them to design in a tagging system, which is a DIY taxonomy, and except in ridiculous use cases, I can make indexes make it fast enough to let everyone overlay their own classification system.
by efitz
5/17/2026 at 5:31:50 AM
> Then, I go back to the implementers and tell them to design in a tagging system, which is a DIY taxonomy, and except in ridiculous use cases, I can make indexes make it fast enough to let everyone overlay their own classification system.This 100x! I wish this were more common.
The key property of a tree is that there a unique path (address) for each element, which is a useful property in the implementation layer. But forcing that on users is a horribly leaky abstraction.
Ideally separate the low-level implementation from the interface, and allow users their own way to address content. I imagine object storage (with UUIDs or whatever) is often good enough for the lower layer. For the interface layer, tags are an improvement on categories (tree structure), but I think there's also room for more innovation (fuzzy matching, AI-driven interfaces, etc) that start by allowing trading-off precision for recall but then allow regaining precision by adding more approximate qualifiers to the filtering.
----
PS: Pushing this approach to 11/10...
An intriguing (crazy?) application of this idea would be: what if we did this to the concept of a codebase? Make it a database (with all the corresponding improvements over a filesystem) -- it's no longer a tree of files, and allow users to query code like "that foo which accepts a bar, frobnicates its internal state, and emits a mutated baz". Note that this might also solve the "naming things" problem.
This setup seems like a powerful abstraction for AI coding agents. All that back-end power (database >> filesystem) is something they can easily leverage, and they can also be built to robustly resolve your fuzzy queries into precise addresses, and then update the code based on your desired outcome.
by ssivark
5/18/2026 at 1:13:57 AM
There's some prior work on the codebase thing:[Unison](https://www.unison-lang.org/docs/the-big-idea/) content addresses every definition. Kinda interesting.
A [Code Property Graph](https://en.wikipedia.org/wiki/Code_property_graph) takes a codebase and turns it into three graph representations: it's AST, a Control Flow Graph, and Program Dependence Graphs. These graphs are overlaid and shoved into a single property graph. It's a structure mainly used by some static analysis tools like [Joern](https://joern.io/)
---
This has been a topic of a lot of interest and research for me. I've been experimenting with figuring out a system inspired by these ideas, among others, to apply the same idea (shoving multiple graph representations together) to a broader set of information
by Jarwain
5/17/2026 at 1:15:00 PM
> that foo which accepts a bar, frobnicates its internal state, and emits a mutated bazTangential, but that reminds of the Haskell "hoogle" tool which allows searching for functions _by type_ across a large database of libraries, even by abstract types. So you might wonder "hmm what's that function that has a type structure like `t a -> (a -> t b) -> t b`?" and it'll happily tell you that it's monad `bind`
by frogulis
5/17/2026 at 4:56:18 PM
Tag Clouds were sticky tasty web goodness a few years back.I've got a legacy tag cloud curation tool for random collections (each collection gets its own id) of URLs. It IFRAMEs each URL to present it; no whining. I've used it for classifying technical docs, photo libraries (then I used the tags to train an image classifier), and to present an analysis of a customer's web site.
It's written in Perl, and (still) runs on modern Perl. Make friends and maybe I'll toss the code your way and help you with your project.
by m3047
5/17/2026 at 9:19:55 AM
More than once I encountered a project lead (often a higher-up) spend a half day after the project kick-off to create an elaborate folder structure for the team.Younger me wondered: "Don't they have more important things to do? Why do they never delegate such a menial and boring task, especially when the structure is kind of obvious"
Today it makes total sense to me. Even if it looks obvious no one has the exact same hierarchy in mind. It was fast for them to materialize the hierarchy themselves than to convince anyone about it in every detail. Some things just can't be delegated.
by weinzierl
5/17/2026 at 12:45:18 PM
That’s a good name! I often resort to “there are many ways to slice a cake”, less sophisticated blunt gets the point across.by designerarvid