I mean you just described OpenClaw. The problem is LLMs suck at "learning" things not trained into them. They will always make mistakes, if the "learning" is just RAG (stuffing new data into the prompt/context, or looking it up in a vector DB and stuffing that into prompt/context).Your agent will basically never get good at learning. The only ways to get closer to that are 1) fine-tuning (expensive, slow and inaccurate), and 2) reinforcement learning (slow and inaccurate). So you can't just build an agent that automatically, incrementally gets better, without waiting for 10+ years for the process to iterate sufficiently. (ask AI researchers, this has been the case for a long time)
However, you can build an agent that can iterate on one specific problem so much that it becomes amazingly good at it, and then do that on another specific problem, and another, until you have a whole bunch of mini-experts. Then you can use those together. To get better than that... use a new model, new prompting techniques, etc.