Google DeepMind is rolling out Gemini 2.5 Deep Think, which is its most advanced AI reasoning model, able to answer questions by exploring and considering multiple ideas simultaneously and then using those outputs to choose the best answer. Gemini 2.5 Deep Think is Google’s first publicly available multi-agent model. These systems spawn multiple AI agents to tackle a question in parallel, a process that uses significantly more computational resources than a single agent, but tends to result in better answers. Gemini 2.5 Deep Think model is a significant improvement over what it announced at I/O. The company also claims to have developed “novel reinforcement learning techniques” to encourage Gemini 2.5 Deep Think to make better use of its reasoning paths. “Deep Think can help people tackle problems that require creativity, strategic planning and making improvements step-by-step,” said Google in a blog post. Gemini 2.5 Deep Think achieves state-of-the-art performance on Humanity’s Last Exam (HLE) — a challenging test measuring AI’s ability to answer thousands of crowdsourced questions across math, humanities, and science. Google claims its model scored 34.8% on HLE (without tools), compared to xAI’s Grok 4, which scored 25.4%, and OpenAI’s o3, which scored 20.3%. Google also says Gemini 2.5 Deep Think outperforms AI models from OpenAI, xAI, and Anthropic on LiveCodeBench 6, a challenging test of competitive coding tasks. Google’s model scored 87.6%, whereas Grok 4 scored 79%, and OpenAI’s o3 scored 72%. Gemini 2.5 Deep Think automatically works with tools such as code execution and Google Search, and is capable of producing “much longer responses” than traditional AI models. In Google’s testing, the model produced more detailed and aesthetically pleasing web development tasks compared to other AI models. The model could aid researchers and “potentially accelerate the path to discovery.”