4/11/2026 at 7:50:24 AM
As long as there's no solution to the long-term memory problem, we will have a "country of geniuses in a data center" that are all suffering from anterograde amnesia (movie: Memento), which requires human hand-holding.I have experimented with a lot of hacks, like hierarchies of indexed md files, semantic DBs, embeddings, dynamic context retrieval, but none of this is really a comprehensive solution to get something that feels as intelligent as what these systems are able to do within their context windows.
I am als a touch skeptical that adjusting weights to learn context will do the trick without a transformer-like innovation in reinforcement learning.
Anyway, I‘ll keep tinkering…
by loehnsberg
4/11/2026 at 10:38:09 AM
I agree. A key to human intelligence is our ability to adjust our weights in real-time. All knowledge becomes parametric knowledge - the knowledge stored inside the model. RAG is a messy workaround which requires making assumptions about what is needed to load from external sources before it is clear what is needed. Agentic loops can go some way to overcome this, but they are resource intensive, slow, prone to mistakes and deviations, and far less accurate. The secret sauce of an LLM is the vectorised weights. RAG is like putting a 1990s Honda Civic engine into a Ferrari. You can do it, but the result is quite terrible.I think we will eventually end up with models which can be individually trained and customised on regular schedules. After that, real-time.
by Gareth321
4/11/2026 at 8:44:36 AM
I've used open claw (just for learning, I agree with the author it's not reliable enough to do anything useful) but also have a similar daily summary routine which is a basic gemini api call to a personal mcp server that has access to my email, calendar etc. The latter is so much more reliable. Open claw flows sometimes nail it, and then the next day fails miserably. It seems like we need a way to 'bank' the correct behaviours - like 'do it like you did it on Monday'. I feel that for any high percentage reliability, we will end up moving towards using LLMs as glue with as much of the actual work as possible being handed off to MCP or persisted routine code. The best use case for LLMs currently is writing code, because once it's written, tested and committed, it's useful for the long term. If we had to generate the same code on the fly for every run, there's no way it would ever work reliably. If we extrapolate that idea, I think it helps to see what we can and can't expect from AI.by gbro3n
4/11/2026 at 7:59:18 AM
You're right to be skeptical. Without a way to actually implement how the human brain processes experiences into a consolidated memory, we won't be able to solve the long term memory problem at all. Not with the current technology.An LLM context is a pretty well extended short term memory, and the trained network is a very nice comprehensive long term memory, but due to the way we currently train these networks, an LLM is just fundamentally not able to "move" these experiences to long term, like a human brain does (through sleep, among others).
Once we can teach a machine to experience something once, and remember it (preferably on a local model, because you wouldn't want a global memory to remember your information), we just cannot solve this problem.
I think this is probably the most interesting field of research right now. Actually understanding in depth how the brain learns, and figuring out a way to build a model that implements this. Because right now, with backtracking and weight adjustments, I just can't see us getting there.
by ambewas
4/11/2026 at 9:17:30 AM
I think if we want to build on what we have, instead of compaction at the end of the context window, the LLM would have to 'sleep', i.e. adjust its weights, then wake up with the last bits of the old context window in the new one, and have a 'feel' for what it did before through the change in weights. I just sense it's not that simple to get there, because simply updating the weights based on a single context sample risks degrading the weights of the whole network.I like the idea of using small local model (or several) for tackling this problem, like low rank adaptation, but with current tech, I still have to piece this together or the small local models will forget old memories.
by loehnsberg
4/11/2026 at 10:05:14 AM
Couldn't fitting solve the problem? That's what companies do: take a model as a base and train it on the specific data long enough so that it prefers the new data. Overfitting may be a thing but for personal use, I may want to have it work as I expected, every time.by SeriousM
4/11/2026 at 9:20:21 AM
[dead]by sonink