• Menu
  • Skip to right header navigation
  • Skip to main content
  • Skip to primary sidebar

DigiBanker

Bringing you cutting-edge new technologies and disruptive financial innovations.

  • Home
  • Pricing
  • Features
    • Overview Of Features
    • Search
    • Favorites
  • Share!
  • Log In
  • Home
  • Pricing
  • Features
    • Overview Of Features
    • Search
    • Favorites
  • Share!
  • Log In

Silicon Valley races to build Reinforcement Learning “environments” as AI labs demand agent training grounds and startups vie to become the Scale AI for simulations

September 18, 2025 //  by Finnovate

AI researchers, founders, and investors tell that leading AI labs are now demanding more reinforcement learning (RL) environments, and there’s no shortage of startups hoping to supply them. The push for RL environments has minted a new class of well-funded startups, such as Mechanize and Prime Intellect, that aim to lead the space. Meanwhile, large data-labeling companies like Mercor and Surge say they’re investing more in RL environments to keep pace with the industry’s shifts from static datasets to interactive simulations. The major labs are considering investing heavily too: according to The Information, leaders at Anthropic have discussed spending more than $1 billion on RL environments over the next year. While RL environments are the hot thing in Silicon Valley right now, there’s a lot of precedent for using this technique. What’s unique about today’s environments is that researchers are trying to build computer-using AI agents with large transformer models. Unlike AlphaGo, which was a specialized AI system working in a closed environments, today’s AI agents are trained to have more general capabilities. AI researchers today have a stronger starting point, but also a complicated goal where more can go wrong. Environments are part of AI labs’ bigger bet on RL, which many believe will continue to drive progress as they add more data and computational resources to the process. The best way to scale RL remains unclear, but environments seem like a promising contender. Instead of simply rewarding chatbots for text responses, they let agents operate in simulations with tools and computers at their disposal. That’s far more resource-intensive, but potentially more rewarding.

Read Article

Category: Additional Reading

Previous Post: « Nothing raises $200M to build their proprietary AI‑first, agentic OS, promising hyper‑personalized experiences and a billion unique systems across phones, wearables, EV, robots etc; first launch in 2026
Next Post: Consortium Mplify’s Enterprise Leadership Council urged mandatory SASE certification across products and services, citing weaponized AI and escalating cyber risks, with signatories from major global enterprises »

Copyright © 2025 Finnovate Research · All Rights Reserved · Privacy Policy
Finnovate Research · Knyvett House · Watermans Business Park · The Causeway Staines · TW18 3BA · United Kingdom · About · Contact Us · Tel: +44-20-3070-0188

We use cookies to provide the best website experience for you. If you continue to use this site we will assume that you are happy with it.