Google researchers have developed a new framework for AI research agents that outperforms leading systems from rivals OpenAI, Perplexity and others on key benchmarks. The new agent, called Test-Time Diffusion Deep Researcher (TTD-DR), is inspired by the way humans write by going through a process of drafting, searching for information, and making iterative revisions. The system uses diffusion mechanisms and evolutionary algorithms to produce more comprehensive and accurate research on complex topics. For enterprises, this framework could power a new generation of bespoke research assistants for high-value tasks that standard retrieval augmented generation (RAG) systems struggle with, such as generating a competitive analysis or a market entry report. Unlike the linear process of most AI agents, human researchers work in an iterative manner. They typically start with a high-level plan, create an initial draft, and then engage in multiple revision cycles. During these revisions, they search for new information to strengthen their arguments and fill in gaps. Google’s researchers observed that this human process could be emulated using a diffusion model augmented with a retrieval component. (A trained diffusion model initially generates a noisy draft, and the denoising module, aided by retrieval tools, revises this draft into higher-quality (or higher-resolution) outputs. TTD-DR is built on this blueprint. The framework treats the creation of a research report as a diffusion process, where an initial, “noisy” draft is progressively refined into a polished final report. This is achieved through two core mechanisms. The first, which the researchers call “Denoising with Retrieval,” starts with a preliminary draft and iteratively improves it. In each step, the agent uses the current draft to formulate new search queries, retrieves external information, and integrates it to “denoise” the report by correcting inaccuracies and adding detail. The second mechanism, “Self-Evolution,” ensures that each component of the agent (the planner, the question generator, and the answer synthesizer) independently optimizes its own performance. The resulting research companion is “capable of generating helpful and comprehensive reports for complex research questions across diverse industry domains. In side-by-side comparisons with OpenAI Deep Research on long-form report generation, TTD-DR achieved win rates of 69.1% and 74.5% on two different datasets. It also surpassed OpenAI’s system on three separate benchmarks that required multi-hop reasoning to find concise answers, with performance gains of 4.8%, 7.7%, and 1.7%.