• Menu
  • Skip to right header navigation
  • Skip to main content
  • Skip to primary sidebar

DigiBanker

Bringing you cutting-edge new technologies and disruptive financial innovations.

  • Home
  • Pricing
  • Features
    • Overview Of Features
    • Search
    • Favorites
  • Share!
  • Log In
  • Home
  • Pricing
  • Features
    • Overview Of Features
    • Search
    • Favorites
  • Share!
  • Log In

New system for training AI agents focuses on multi-turn, interactive settings where agents must adapt, remember, and reason in the face of uncertainty instead of static tasks like math solving or code generation

April 25, 2025 //  by Finnovate

A collaborative team from Northwestern University, Microsoft, Stanford, and the University of Washington — including a former DeepSeek researcher named Zihan Wang, currently completing a computer science PhD at Northwestern — has introduced RAGEN, a new system for training and evaluating AI agents that they hope makes them more reliable and less brittle for real-world, enterprise-grade usage. Unlike static tasks like math solving or code generation, RAGEN focuses on multi-turn, interactive settings where agents must adapt, remember, and reason in the face of uncertainty. Built on a custom RL framework called StarPO (State-Thinking-Actions-Reward Policy Optimization), the system explores how LLMs can learn through experience rather than memorization. StarPO-S incorporates three key interventions: Uncertainty-based rollout filtering; KL penalty removal; and Asymmetric PPO clipping. StarPO operates in two interleaved phases: a rollout stage where the LLM generates complete interaction sequences guided by reasoning, and an update stage where the model is optimized using normalized cumulative rewards. This structure supports a more stable and interpretable learning loop compared to standard policy optimization approaches. The team identified three dimensions that significantly impact training: Task diversity, Interaction granularity, and Rollout freshness. Together, these factors make the training process more stable and effective.

Read Article

Category: Members, AI & Machine Economy, Innovation Topics

Previous Post: « OpenAI is planning a truly ‘open reasoning’ AI system with a ‘handoff’ feature that would enable it to make calls to the OpenAI API to access other, larger models for a substantial computational lift
Next Post: Amazon’s new benchmark to evaluate AI coding agents’ ability to navigate and understand complex codebases and GitHub issues »

Copyright © 2025 Finnovate Research · All Rights Reserved · Privacy Policy
Finnovate Research · Knyvett House · Watermans Business Park · The Causeway Staines · TW18 3BA · United Kingdom · About · Contact Us · Tel: +44-20-3070-0188

We use cookies to provide the best website experience for you. If you continue to use this site we will assume that you are happy with it.