5/26/2026 at 4:38:40 PM
The idea of periodically stopping to write blocks of recent context into a fast-weight state is interesting, but I think it liked it better when E2E-TTT[1] did it. It's a more flexible and elegant continuous learning approach.Essentially it goes "You know how your model can remember its training data? Well, what if you treated its recent context like more training data and updated (some of) the weights using (mostly) the same process used to train it?"
The end result is very good at remembering things but also really good at adapting to new unseen distributions.
by thunderbird120
5/26/2026 at 7:19:31 PM
Yah I think E2E-TTT is a lot more like what people in this comments section are picturing. I can't tell that this method updates model weights at all during the "sleep" period, only the usual SSM state updated by any Mamba model after each token. They just optimized the model to use that SSM state _more_ when an eviction is about to happen.by samsartor
5/26/2026 at 10:32:27 PM
Each model needs to be a separate copy, or at least have those particular weights be interchangeable, for every single user.Remember Microsoft Tay.
by soulofmischief
5/27/2026 at 12:09:03 AM
Yes, since the weights being updated are a small subset of the overall total it's manageable. Just like how each separate conversation currently requires you to store a separate KV cache, you'd need to store the fast weights separately. Both KV cache and fast weight content stores have to be conversation specific, so just setting a bit of extra RAM aside for "memory" isn't really a new ask, just a different format for an old problem.by thunderbird120
5/26/2026 at 9:28:52 PM
I wonder if we can get children to make something their life’s dream if we make the cool books about it when they are growing up? I wonder how flexible the human mind can be in convincing itself that it is fulfilling its dream?by pfannkuchen
5/26/2026 at 11:40:38 PM
This sounds like a horror novelby knollimar