UCL and Huawei's memory-augmented MDP framework lets LLM agents learn in real-time from experience without fine-tuning; reaching 79.40% in GAIA test • DigiBanker

A new learning paradigm developed by University College London (UCL) and Huawei Noah’s Ark Lab enables LLM agents to dynamically adapt to their environment without fine-tuning the underlying language model. The method allows agents to continuously improve their performance by using a structured memory system that updates itself as the agent gathers experience. An implementation of the paradigm, which the researchers call Memento, has achieved top scores on key benchmarks for deep research and complex, multi-step reasoning tasks. For enterprises, this offers a scalable and efficient pathway for developing generalist LLM agents that are capable of continuous, real-time learning without the high cost and downtime associated with traditional training methods. Inspired by human memory, the memory-based learning framework that enables continual adaptation without modifying the LLM. Instead of fine-tuning the base model, agents leverage an external memory to store past experiences. When faced with a new task, the agent draws from similar past situations to guide its decision-making. This process builds on the Markov decision process (MDP), a classic framework in AI for teaching an agent to make optimal decisions. The researchers formalize their new approach as a Memory-augmented MDP (M-MDP), which enhances this framework by allowing the agent to consider not just its current state and potential actions, but also a rich memory of past events. The system has three main components: a planner and a tool-enabled executor that work in an alternating loop to complete tasks, and a growing “case bank” that stores past experiences. The more advanced parametric version uses reinforcement learning with a lightweight neural network to address a common real-world challenge: sparse feedback. For tasks where success or failure signals are infrequent, this method helps the feedback “propagate through various stages,” ensuring the agent learns reliably over time.

Read Article