alt.hn

4/13/2026 at 8:42:19 PM

Show HN: Continual Learning with .md

https://github.com/SunAndClouds/ReadMe

by wenhan_zhou

4/13/2026 at 11:00:45 PM

The markdown approach has a real advantage people underestimate: you can read and edit the memory yourself. With vector DBs and embeddings the memory becomes opaque — you can't inspect or correct what the model "knows". Plain files keep the human in the loop.

The hard part is usually knowing what +not+ to write down. Every system I've seen eventually drowns in low-signal entries.

by alexbike

4/14/2026 at 5:10:11 AM

The editability is surely an underrated advantage, both for the program itself and the memories it generated.

I think in terms of noise, it is less problematic here because not everything is being retrieved. The agent can selectively explore subsets of the tree (plus you can edit the exploration policy by yourself).

Since there is no context bloat, it is quite forgivable to just write things down.

by wenhan_zhou

4/13/2026 at 11:07:26 PM

This assumes that the model's behavior and memories are faithful to their english/human language representation, and don't stray into (even subtle) "neuralese".

by in-silico

4/14/2026 at 7:09:18 AM

Having run a Markdown memory system with Claude for over a year, I don't think I've seen any evidence of neuralese. That's even with Claude being regularly encouraged to write "reflections" on each session, including automated sessions, and weekly summaries of those reflections.

The bigger problem is avoiding what I call the Memento Effect. I won't spoil the movie for anyone, but Memento involves a character who cannot make new memories, so he has to take meticulous notes about everything. But if any of those notes are vague or incorrect, they still get accept as truth when next reviewed. So you really need your Markdown memory to be pristine and mustn't allow it to become polluted.

by SyneRyder

4/14/2026 at 8:22:03 AM

I think what's missing is a benchmark that measures how well the memories contribute to future interactions.

by wenhan_zhou

4/13/2026 at 11:47:19 PM

Is there anything (besides plumbing) that prevents both? i.e. when the file is edited, all the representations are updated

by verdverm

4/14/2026 at 3:11:53 AM

[dead]

by Dahvay

4/14/2026 at 1:25:04 PM

Your example is with Codex - OpenAI could implement this easily on their end right? Every prompt of yours was an API call and they have a log, they can easily re-create a quick history of what you did/asked for before?

by _zer0c00l_

4/14/2026 at 10:19:43 AM

Seems interesting. Ill give it a try on my agent, memory is definitely an ongoing issue. How long have you been running this in a continuous state? Also have you tried other LLM's and seen a difference on how well they can use it?

by jusasiiv

4/14/2026 at 10:38:21 AM

Although I have been working on memory before, ReadMe is very fresh. The moment I saw it running, I published it. So, no continuous running nor LLM ablation studies.

Treat it as an MVP, would love to hear how your agent performs!

by wenhan_zhou

4/14/2026 at 9:36:33 AM

is this based on Karapathy's LLM Wiki idea (link: https://gist.github.com/karpathy/442a6bf555914893e9891c11519...)? . I leveraged Karapathy's wiki idea and built MCPTube- it's a CLI and also an MCP server that turns YouTube videos into a compounding knowledge base. Check it out and let me know what you think (link: https://github.com/0xchamin/mcptube)

by 0xchamin

4/14/2026 at 11:06:06 AM

I just read LLM Wiki in more detail. I have heard about it second-hand before this project. The "no-code" idea was inspired by Karpathy.

As I have understood it, in LLM Wiki, the human is very much in the loop in what gets written. In ReadMe, the human control is mostly on the policy (prompt) level, and it is done once, the agent then goes full autonomously afterwards.

After a quick skim of your project.

I have tried an embedding-based knowledge base as well, but it is a bit tricky to make the embedding match a user query. For example, "What happened?" is not at all similar to "Batman defeats Joker." You need to reformulate the query using an LLM, which is tricky given that the query is conditioned on the whole chat history. That's partly why I abandoned embedding-based methods.

But given that MCPTube already works on Gemini CLI, I could see it work natively without embeddings. Gemini is capable of reading video files natively. Worth a try?

by wenhan_zhou

4/13/2026 at 9:57:47 PM

I've seen a lot of such systems come and go. One of my friends is working on probably the best (VC-funded) memory system right now.

The problem always is that when there are too many memories, the context gets overloaded and the AI starts ignoring the system prompt.

Definitely not a solved problem, and there need to be benchmarks to evaluate these solutions. Benchmarks themselves can be easily gamed and not universally applicable.

by namanyayg

4/14/2026 at 1:03:53 AM

The armchair ML engineer in me says our current context management approach is the issue. With a proper memory management system wired up to it’s own LLM-driven orchestrator, memories should be pulled in and pushed out between prompts, and ideally, in the middle of a “thinking” cycle. You can enhance this to be performant using vector databases and such but the core principle remains the same and is oft repeated by parents across the world: “Clean up your toys before you pull a new one out!”

Also since I thought for another 30 seconds, the “too many memories!” Problem imo is the same problem as context management and compaction and requires the same approach: more AI telling AI what AI should be thinking about. De-rank “memories” in the context manager as irrelevant and don’t pass them to the outer context. If a memory is de-ranked often and not used enough it gets purged.

by natpalmer1776

4/14/2026 at 5:24:33 AM

Fair concern.

ReadMe does support loading memories mid-reasoning! It is simply an agent reading files.

Although GPT-5.4 currently likes to explore a lot upfront, and only then responds. But that is more of a model behaviour (adjustable through prompting) rather than an architectural limitation.

by wenhan_zhou

4/14/2026 at 5:26:42 AM

Ah, I mean bi-directional management of context. Add and remove. Basically just the remove bit since we have adding down.

by natpalmer1776

4/14/2026 at 6:28:45 AM

I see your point.

A removal mechanism is not (yet) implemented. But in principle, we could adjust the instructions in Update.md so that it does a minor "refactor" of the filesystem each day, then newer abstractions can form, while irrelevant gets pruned/edited. That's the beauty of the architecture, you define how the update can occur!

But if you do have a new memory (possibly contradicting an old one), is it really a good idea to prune/edit it?

If you are genuinely uncertain between choice A and B, then having them both exist in the memory archive might be a feature. The agent gets the possibility of seeing contradictory evidence on different dates, which communicates indecisiveness.

by wenhan_zhou

4/14/2026 at 7:13:54 AM

Do you remember the day you learned how to perform long division?

The purpose of memory pruning is not to “forget” useful or even contradictory information, but to condense it so that the useful bits of the memory take less context and be more immediately accessible in situations that need it.

by natpalmer1776

4/14/2026 at 7:48:49 AM

I don't remember such details, but as you suggest, it is a healthy kind of compression.

I address it through merging the lower-level memories into more abstracted ones through a temporal hierarchical filesystem. So, days -> months -> quarters -> years. Each time scale focuses on a more "useful" context since uncertain/contradictory information does not survive as it goes up in abstraction.

For example, A day-level memory might be: "The user learned how to divide 314 by 5 with long division on Jan 3rd 2017."

A year-level memory might be: "The user progressed significantly in mathematics during elementary school."

From the perspective of the LLM, it is easier to access the year-level memories because it requires fewer "cd" commands, and it only dives down into lower levels when necessary.

by wenhan_zhou

4/14/2026 at 1:15:42 AM

Mid thinking cycle seems dangerous as it will probably kill caching.

by dummydummy1234

4/14/2026 at 1:17:29 AM

The mid thinking cycle would require significant architecture change to current state of art and imo is a key blocker to AGI

by natpalmer1776

4/14/2026 at 5:18:09 AM

Context bloat is real, but the architecture has the potential to solve it.

You need clever naming for the filesystem and exploration policy in AGENTS.md. (not trivial!)

The benchmark is definitely the core bottleneck. I don't know any good benchmark for this, probably an open research question in itself.

by wenhan_zhou

4/14/2026 at 1:01:42 AM

What is the memory system you are referring to? I've been trying Memori with OpenClaw. Haven't had a ton of time to really kick the tires on it, so the jury's still out.

by xwowsersx

4/14/2026 at 1:42:41 AM

I love how you approached this with markdown !

I guess the markdown approach really has a advantage over others.

PS : Something I built on markdown : https://voiden.md/

by dhruv3006

4/14/2026 at 5:27:07 AM

Yep. Markdown is the future :-)

by wenhan_zhou

4/13/2026 at 9:53:22 PM

I really like the simplicity of this! What's retrieval performance and speed like?

by sudb

4/14/2026 at 5:35:01 AM

Minimalism is my design philosophy :-)

Good question. Since it is just an LLM reading files, it depends entirely on how fast it can call tools, so it depends on the token/s of the model.

Haven't done a formal benchmark, but from the vibes, it feels like a few seconds for GPT-5.4-high per query.

There is an implicit "caching" mechanism, so the more you use it, the smoother it will feel.

by wenhan_zhou

4/14/2026 at 10:45:35 AM

[dead]

by inveflo

4/14/2026 at 8:31:50 AM

[dead]

by thomasquarre