Silicon Valley races to build Reinforcement Learning “environments” as AI labs demand agent training grounds and startups vie to become the Scale AI for simulations • DigiBanker

AI researchers, founders, and investors tell that leading AI labs are now demanding more reinforcement learning (RL) environments, and there’s no shortage of startups hoping to supply them. The push for RL environments has minted a new class of well-funded startups, such as Mechanize and Prime Intellect, that aim to lead the space. Meanwhile, large data-labeling companies like Mercor and Surge say they’re investing more in RL environments to keep pace with the industry’s shifts from static datasets to interactive simulations. The major labs are considering investing heavily too: according to The Information, leaders at Anthropic have discussed spending more than $1 billion on RL environments over the next year. While RL environments are the hot thing in Silicon Valley right now, there’s a lot of precedent for using this technique. What’s unique about today’s environments is that researchers are trying to build computer-using AI agents with large transformer models. Unlike AlphaGo, which was a specialized AI system working in a closed environments, today’s AI agents are trained to have more general capabilities. AI researchers today have a stronger starting point, but also a complicated goal where more can go wrong. Environments are part of AI labs’ bigger bet on RL, which many believe will continue to drive progress as they add more data and computational resources to the process. The best way to scale RL remains unclear, but environments seem like a promising contender. Instead of simply rewarding chatbots for text responses, they let agents operate in simulations with tools and computers at their disposal. That’s far more resource-intensive, but potentially more rewarding.

Read Article