OpenAI has launched o3-pro, an AI model that the company claims is its most capable yet. “In expert evaluations, reviewers consistently prefer o3-pro over o3 in every tested category and especially in key domains like science, education, programming, business, and writing help,” OpenAI writes in a changelog. “Reviewers also rated o3-pro consistently higher for clarity, comprehensiveness, instruction-following, and accuracy.” O3-pro has access to tools, according to OpenAI, allowing it to search the web, analyze files, reason about visual inputs, use Python, personalize its responses leveraging memory, and more. As a drawback, the model’s responses typically take longer than o1-pro to complete, according to OpenAI. O3-pro has other limitations. Temporary chats with the model in ChatGPT are disabled for now while OpenAI resolves a “technical issue.” O3-pro can’t generate images. And Canvas, OpenAI’s AI-powered workspace feature, isn’t supported by o3-pro. On the plus side, o3-pro achieves impressive scores in popular AI benchmarks. On AIME 2024, which evaluates a model’s math skills, o3-pro scores better than Google’s top-performing AI model, Gemini 2.5 Pro. O3-pro also beats Anthropic’s recently released Claude 4 Opus on GPQA Diamond, a test of PhD-level science knowledge. O3-pro is priced at $20 per million input tokens and $80 per million output tokens in the API. Input tokens are tokens fed into the model, while output tokens are tokens that the model generates based on the input tokens.
Vanta’s AI agent auto-maps policies to relevant compliance controls by scanning uploaded documents and extracting key details including version history and SLAs, while providing rationale for its recommendations
Compliance automation startup Vanta unveiled an autonomous AI agent that handles end-to-end security and compliance workflows without human intervention. Unlike traditional automation tools that follow pre-defined rules, the Vanta AI Agent proactively identifies compliance issues, suggests fixes and takes action on behalf of security teams while keeping humans in control of final decisions. “By minimizing human error and taking on repetitive tasks, the Vanta AI agent enables teams to focus on higher-value work—the work that truly builds trust. The AI Agent tackles four critical areas that typically consume hundreds of hours of manual work. For policy onboarding, the system scans uploaded documents, extracts key details including version history and service level agreements, and automatically maps policies to relevant compliance controls while providing rationale for its recommendations. The AI Agent reviews uploaded documents against audit requirements to ensure accuracy and completeness, identifying gaps before they become issues. Perhaps most significantly, the agent proactively monitors for inconsistencies between written policies and actual practices—a common source of audit failures. The system also functions as an intelligent knowledge base, answering complex policy questions in real time. Looking ahead, the agent will support end-to-end compliance workflows by connecting all aspects of a customer’s program across the Vanta Trust Management Platform, including risk oversight and security reviews. This comprehensive approach could fundamentally alter how enterprises approach security and compliance management.
Databricks unveiled Agent Bricks, a unified workspace that automates agent building and optimization; includes automated “judges” to prevent situations like an agent recommending a rival’s product
Databricks unveiled Agent Bricks, a unified workspace that automates agent building and optimization using customers’ enterprise data and synthetic equivalents. A key part of the release involves the use of large language model automated “judges” that generate questions and expected answers to assess model performance. This is presumably to resolve situations such as the one described by Ali Ghodsi, co-founder and chief executive, where one automaker expressed concern to him about an agent that was recommending a competitor’s cars. The company also released Lakebase, a managed Postgres database built for AI, added an operational layer to the firm’s Data Intelligence Platform. Lakebase builds on the company’s acquisition of Neon Inc. Neon’s serverless PostgreSQL platform allows developers to add support for data structures where AI models keep information. Another offering called Lakeflow Designer, is a no-code capability allows users to author production data pipelines by using a drag-and-drop interface and a natural language generative AI assistant. It’s the latest entry in the field of “vibe coding.” Through tools such as Agent Bricks and Lakebase, Databricks is building the infrastructure to support this change in how software is created and deployed.
Databricks brings data insights to every business worker with AI-powered BI- generating the required SQL code itself, and executes it on the customer’s data warehouse, abstracting the complexity away from the user
Databricks is trying to bring the power of big data analytics to every business worker with the launch of its new AI-powered business intelligence tool, Databricks One. Its simplified allows users to describe the type of data analysis they want to perform. Then, an LLM performs the necessary technical work to get that analysis done. It can take actions such as deploying AI agents into data pipelines and databases to perform extremely specific and detailed analysis. It generates the required SQL code itself, and executes it on the customer’s data warehouse, abstracting the complexity away from the user. Once the analysis is done, Databricks One will show the results via suitable visualizations that appear directly in its interface. Users can then dig into these visualizations with an “AI/BI Genie,” and ask more detailed questions using natural language. The use cases are varied. For instance, marketing professionals might want to perform some analytics to see how effective their latest campaign has been, while legal professionals might want to review any overlapping business contracts that could conflict with one another. Salespeople could use it to gather every piece of information they need ahead of a meeting with a new lead.
Zencoder automates testing of agents- sees and interacts with applications as users do—clicking buttons, filling forms, navigating flows, and validating both UI state and backend responses
Zencoder announced the public beta of Zentester, an AI-powered agent that transforms end-to-end (E2E) testing from a bottleneck into an accelerator. This breakthrough enables development teams to move from “vibe coding” to production-ready software by accelerating quality assurance and providing instant verification within developer workflows. While AI coding assistants have revolutionized code generation, the gap between writing code and shipping reliable software remains vast. Teams need faster feedback loops, and E2E testing – the final verification that software actually works – continues to be a manual, brittle process that can add days or weeks to release cycles. Zentester sees and interacts with applications as users do—clicking buttons, filling forms, navigating flows, and validating both UI state and backend responses. The agent can take scenarios in plain English without wrestling with scripting frameworks. This brings comprehensive E2E testing directly to the engineer’s fingertips—both in their IDE through Zencoder’s existing integrations and in CI/CD pipelines via Zen Agents for CI. It enables five mutually supportive use cases: Developer-Led Quality, QA Acceleration, Quality Improvement for AI Coding Agents, Healing Tests, and Autonomous Verification.
Mosaic Agent Bricks platform automates agent optimization and tuning without the need for labeled data
Many enterprise AI agent development efforts never make it to production and it’s not because the technology isn’t ready. The problem, according to Databricks, is that companies are still relying on manual evaluations with a process that’s slow, inconsistent and difficult to scale. Databricks launched Mosaic Agent Bricks as a solution to that challenge. The Mosaic Agent Bricks platform automates agent optimization using a series of research-backed innovations. Among the key innovations is the integration of TAO (Test-time Adaptive Optimization), which provides a novel approach to AI tuning without the need for labeled data. Mosaic Agent Bricks also generates domain-specific synthetic data, creates task-aware benchmarks and optimizes quality-to-cost balance without manual intervention. Agent Bricks automates the entire optimization pipeline. The platform takes a high-level task description and enterprise data. It handles the rest automatically. The platform offers four agent configurations: Information Extraction; Knowledge Assistant; Custom LLM; Multi-Agent Supervisor. Databricks also announced the general availability of its Lakeflow data engineering platform. Lakeflow solves the data preparation challenge. It unifies three critical data engineering journeys that previously required separate tools. Ingestion handles getting both structured and unstructured data into Databricks. Transformation provides efficient data cleaning, reshaping and preparation.
Vanta’s GRC AI agent uses program context to proactively detect inconsistencies between policy-defined service level agreements and the outcomes of continuous testing, flag mismatches and suggest fixes
Cybersecurity compliance startup Vanta has launched Vanta AI Agent, a new agent designed to handle end-to-end workflows autonomously across a company’s entire compliance program. The new agent contextually guides organizations through key tasks, accurately identifies issues and inconsistencies humans might miss and proactively takes action on their behalf, while keeping governance, risk and compliance teams informed and in control. The agent uses program context to offer timely support and surface issues before they become costly errors. It reduces human error by taking on manual, time-consuming tasks and, in doing so, frees teams to focus on higher-value work that builds trust while strengthening their security and compliance posture. In addition to its core functions, Vanta AI Agent also generates clear and actionable policy change summaries, streamlining the process of updating compliance documentation during annual reviews. The result reduces the need for manual input, allowing teams to focus on strategic decision-making. Other features of the agent include the ability to proactively detect inconsistencies between policy-defined service level agreements and the outcomes of continuous testing. When mismatches occur, the agent flags them and suggests fixes, helping teams address issues before they escalate into audit risks. Vanta AI Agent further simplifies information retrieval by answering policy and compliance-related questions in real time. Teams can quickly access critical details such as password requirements, vendor risk management information and compliance with standards like Service Organization Control 2 and the Health Insurance Portability and Accountability Act.
Zencoder’s UI testing AI agent imitates how humans behave when interacting with web applications by combining images with snapshots and generating test artifacts to capture the expected visual and functional outcomes
Zencoder has announced a public beta for Zentester, its new end-to-end UI testing AI agent. Zentester imitates how humans behave when interacting with web applications, such as navigating the layout, and identifying and using interactive elements. It does this by combining images (screenshots) with DOM (snapshot) information. As it runs through test scenarios, it generates test artifacts that capture the actions performed and the expected visual and functional outcomes. According to the company, these tests are designed to be maintainable over time and less prone to having issues when an application changes. It also automatically follows end-to-end testing best practices, such as proper wait strategies, error handling, and test isolation.
Deepgram’s Voice Agent API combines speech-to-text, text-to-speech, and LLM orchestration with contextualized conversational logic into a unified architecture to enable deploying real-time, intelligent voice agents at scale
Deepgram has announced the general availability of its Voice Agent API, a single, unified voice-to-voice interface that gives developers full control to build context-aware voice agents that power natural, responsive conversations. Combining speech-to-text, text-to-speech, and LLM orchestration with contextualized conversational logic into a unified architecture, the Voice Agent API gives developers the choice of using Deepgram’s fully integrated stack or bringing their own LLM and TTS models. It delivers the simplicity developers love and the controllability enterprises need to deploy real-time, intelligent voice agents at scale. Deepgram’s Voice Agent API provides a unified API that simplifies development without sacrificing control. Developers can build faster with less complexity, while enterprises retain full control over orchestration, deployment, and model behavior, without compromising on performance or reliability. Deepgram’s Voice Agent API also provides a single, unified API that integrates speech-to-text, LLM reasoning, and text-to-speech with built-in support for real-time conversational dynamics. Capabilities such as barge-in handling and turn-taking prediction are model-driven and managed natively within the platform. This eliminates the need to stitch together multiple vendors or maintain custom orchestration, enabling faster prototyping, reduced complexity, and more time focused on building high-quality experiences. The platform enables model-level optimization at every layer of the interaction loop. This allows for precise tuning of latency, barge-in handling, turn-taking, and domain-specific behavior in ways not possible with disconnected components.
Groq’s custom Language Processing Unit (LPU) architecture, designed specifically for AI inference enables it to handle memory-intensive operations like large context windows at lower cost compared to general-purpose GPUs
Groq became an official inference provider on Hugging Face’s platform, potentially exposing its technology to millions of developers worldwide. The Hugging Face integration extends the Groq ecosystem providing developers choice and further reduces barriers to entry in adopting Groq’s fast and efficient AI inference. Groq’s assertion about context windows — the amount of text an AI model can process at once — strikes at a core limitation that has plagued practical AI applications. Most inference providers struggle to maintain speed and cost-effectiveness when handling large context windows, which are essential for tasks like analyzing entire documents or maintaining long conversations. Independent benchmarking firm Artificial Analysis measured Groq’s Qwen3 32B deployment running at approximately 535 tokens per second, a speed that would allow real-time processing of lengthy documents or complex reasoning tasks. The company is pricing the service at $0.29 per million input tokens and $0.59 per million output tokens — rates that undercut many established providers. Groq offers a fully integrated stack, delivering inference compute that is built for scale, which means we are able to continue to improve inference costs while also ensuring performance that developers need to build real AI solutions. The technical advantage stems from Groq’s custom Language Processing Unit (LPU) architecture, designed specifically for AI inference rather than the general-purpose graphics processing units (GPUs) that most competitors rely on. This specialized hardware approach allows Groq to handle memory-intensive operations like large context windows more efficiently. By becoming an official inference provider, Groq gains access to the vast developer ecosystem of HuggingFace with streamlined billing and unified access. Amazon’s Bedrock service leverages AWS’s massive global cloud infrastructure, while Google’s Vertex AI benefits from the search giant’s worldwide data center network. Microsoft’s Azure OpenAI service has similarly deep infrastructure backing. However Groq says, “As an industry, we’re just starting to see the beginning of the real demand for inference compute. Even if Groq were to deploy double the planned amount of infrastructure this year, there still wouldn’t be enough capacity to meet the demand today.”
