New system for training AI agents focuses on multi-turn, interactive settings where agents must adapt, remember, and reason in the face of uncertainty instead of static tasks like math solving or code generation • DigiBanker

A collaborative team from Northwestern University, Microsoft, Stanford, and the University of Washington — including a former DeepSeek researcher named Zihan Wang, currently completing a computer science PhD at Northwestern — has introduced RAGEN, a new system for training and evaluating AI agents that they hope makes them more reliable and less brittle for real-world, enterprise-grade usage. Unlike static tasks like math solving or code generation, RAGEN focuses on multi-turn, interactive settings where agents must adapt, remember, and reason in the face of uncertainty. Built on a custom RL framework called StarPO (State-Thinking-Actions-Reward Policy Optimization), the system explores how LLMs can learn through experience rather than memorization. StarPO-S incorporates three key interventions: Uncertainty-based rollout filtering; KL penalty removal; and Asymmetric PPO clipping. StarPO operates in two interleaved phases: a rollout stage where the LLM generates complete interaction sequences guided by reasoning, and an update stage where the model is optimized using normalized cumulative rewards. This structure supports a more stable and interpretable learning loop compared to standard policy optimization approaches. The team identified three dimensions that significantly impact training: Task diversity, Interaction granularity, and Rollout freshness. Together, these factors make the training process more stable and effective.

Read Article