• Menu
  • Skip to right header navigation
  • Skip to main content
  • Skip to primary sidebar

DigiBanker

Bringing you cutting-edge new technologies and disruptive financial innovations.

  • Home
  • Pricing
  • Features
    • Overview Of Features
    • Search
    • Favorites
  • Share!
  • Log In
  • Home
  • Pricing
  • Features
    • Overview Of Features
    • Search
    • Favorites
  • Share!
  • Log In

UCL and Huawei’s memory-augmented MDP framework lets LLM agents learn in real-time from experience without fine-tuning; reaching 79.40% in GAIA test

September 9, 2025 //  by Finnovate

A new learning paradigm developed by University College London (UCL) and Huawei Noah’s Ark Lab enables LLM agents to dynamically adapt to their environment without fine-tuning the underlying language model. The method allows agents to continuously improve their performance by using a structured memory system that updates itself as the agent gathers experience. An implementation of the paradigm, which the researchers call Memento, has achieved top scores on key benchmarks for deep research and complex, multi-step reasoning tasks. For enterprises, this offers a scalable and efficient pathway for developing generalist LLM agents that are capable of continuous, real-time learning without the high cost and downtime associated with traditional training methods. Inspired by human memory, the memory-based learning framework that enables continual adaptation without modifying the LLM. Instead of fine-tuning the base model, agents leverage an external memory to store past experiences. When faced with a new task, the agent draws from similar past situations to guide its decision-making. This process builds on the Markov decision process (MDP), a classic framework in AI for teaching an agent to make optimal decisions. The researchers formalize their new approach as a Memory-augmented MDP (M-MDP), which enhances this framework by allowing the agent to consider not just its current state and potential actions, but also a rich memory of past events. The system has three main components: a planner and a tool-enabled executor that work in an alternating loop to complete tasks, and a growing “case bank” that stores past experiences. The more advanced parametric version uses reinforcement learning with a lightweight neural network to address a common real-world challenge: sparse feedback. For tasks where success or failure signals are infrequent, this method helps the feedback “propagate through various stages,” ensuring the agent learns reliably over time.

Read Article

Category: Additional Reading

Previous Post: « Retail stablecoin transfers surpass 2024 totals by August, with sub-$250 payments reaching $5.84B; emerging-market users cite fees and delays as adoption drivers
Next Post: Palo Alto Networks CEO warns agentic browsers won’t enter enterprises within 24 months without strict credential and security controls, citing autonomy risks and identity threats »

Copyright © 2025 Finnovate Research · All Rights Reserved · Privacy Policy
Finnovate Research · Knyvett House · Watermans Business Park · The Causeway Staines · TW18 3BA · United Kingdom · About · Contact Us · Tel: +44-20-3070-0188

We use cookies to provide the best website experience for you. If you continue to use this site we will assume that you are happy with it.