3/31/2026 at 7:51:25 PM
There are also interesting approaches to more directly compress a large document or an entire codebase into a smaller set of tokens without getting the LLM to wing it. For example, Cartridges: <https://hazyresearch.stanford.edu/blog/2025-06-08-cartridges>They basically get gradient descent to optimize the KV cache while freezing the network.
by coppsilgold