OpenAI released a new benchmark that tests how its AI models perform compared to human professionals across a wide range of industries and jobs. The test, GDPval, is an early attempt at understanding how close OpenAI’s systems are to outperforming humans at economically valuable work — a key part of the company’s founding mission to develop artificial general intelligence, or AGI. GDPval is based on nine industries that contribute the most to America’s gross domestic product, including domains such as healthcare, finance, manufacturing, and government. The benchmark tests an AI model’s performance in 44 occupations among those industries, ranging from software engineers to nurses to journalists. For OpenAI’s first version of the test, GDPval-v0, OpenAI asked experienced professionals to compare AI-generated reports with those produced by other professionals, and then choose the best one. For example, one prompt asked investment bankers to create a competitor landscape for the last-mile delivery industry and compare them to AI-generated reports. OpenAI then averages an AI model’s “win rate” against the human reports across all 44 occupations. For GPT-5-high, a souped-up version of GPT-5 with extra computational power, the company says the AI model was ranked as better than or on par with industry experts 40.6% of the time. It’s worth noting that most working professionals do a lot more than submit research reports to their boss, which is all that GDPval-v0 tests for. OpenAI acknowledges this and says it plans to create more robust tests in the future that can account for more industries and interactive workflows. OpenAI also tested Anthropic’s Claude Opus 4.1 model, which was ranked as better than or on par with industry experts in 49% of tasks. OpenAI says that it believes Claude scored so high because of its tendency to make pleasing graphics, rather than sheer performance.
OpenAI launches ChatGPT Pulse to proactively write you morning briefs that can get them up to speed on their day
OpenAI is launching a new feature inside of ChatGPT called Pulse, which generates personalized reports for users while they sleep. Pulse offers users five to 10 briefs that can get them up to speed on their day and is aimed at encouraging users to check ChatGPT first thing in the morning — much like they would check social media or a news app. Pulse is part of a broader shift in OpenAI’s consumer products, which are lately being designed to work for users asynchronously instead of responding to questions. Features like ChatGPT Agent or Codex aim to make ChatGPT feel more like an assistant rather than a chatbot. With Pulse, OpenAI seemingly wants ChatGPT to be more proactive. OpenAI will roll out Pulse for subscribers to its $200-a-month Pro plan, for whom it will appear as a new tab in the ChatGPT app. The company says it would like to launch Pulse to all ChatGPT users in the future, with Plus subscribers to get access soon, but it first needs to make the product more efficient. Pulse’s reports can be roundups of news articles on a specific topic — like updates on a specific sports team — as well as more personalized briefs based on a user’s context. Each report is displayed as a “card” featuring AI-generated images and text. Users can click on each one to get the full report and can then query ChatGPT about the contents. Pulse will proactively generate some reports, but users can also ask Pulse for new automated reports or offer feedback on existing ones. A core part of Pulse is that it stops after generating a few reports and shows a message: “Great, that’s it for today.” That’s an intentional design choice to make the service different from engagement-optimized social media apps. If users have ChatGPT’s memory features turned on, Pulse will also pull in context from previous chats to improve your reports.
Cloudflare launches payments stablecoin intended for autonomous software agents, developers, and online creators, enabling automated payments for services and content across borders
Cloudflare launched a U.S. dollar-backed stablecoin to support transactions on the AI-driven Internet. The token is reportedly intended for autonomous software agents, developers, and online creators, enabling automated payments for services and content across borders. “The Internet’s next business model will be powered by pay-per-use, fractional payments, and microtransactions, tools that shift incentives toward original, creative content that actually adds value,” commented Matthew Prince, co-founder and CEO of Cloudflare. “By using our global network, we are going to help modernize the financial rails needed to move money at the speed of the Internet, helping to create a more open and valuable Internet for everyone,” he explained. The new offering is reportedly created for what the company described as the “agentic web,” where AI agents perform tasks such as booking travel, ordering goods, or managing schedules. The stablecoin enables instant and reliable payments across currencies and geographies, allowing both personal and business agents to execute transactions automatically. The token can help personal agents pay for items immediately when they become available, while business agents could settle supplier payments as soon as deliveries are confirmed. Simon Taylor, Founder of Fintech Brainfood, said: “Cloudflare helps host websites, prevent bot attacks, and now they’re launching NET Dollar, a USD-backed stablecoin built for autonomous commerce.” NET Dollar is designed to compensate creators for original content and developers for monetizing APIs and applications. Cloudflare is also developing open standards, including the Agent Payments Protocol and x402, to simplify sending and receiving online payments. The company emphasized that NET Dollar is designed to be interoperable with other payment systems.
AI agents may not be so useful after all, new study finds; many agents are creating more rework for 58% of employees
Agentic AI has become the hot topic of 2025, for example. Yet, even though a large percentage of employees use AI agents, there appear to be a few significant hiccups, according to project management platform Asana. “Workers already hand 27% of their workload to agents, rising to 34% in a year and 43% within three years, signaling the biggest shift in how work gets done since the arrival of the PC.” However, “62% of workers say agents are unreliable; 59% report they confidently share wrong information, and 57% say they ignore feedback. Instead of reducing work, many agents are creating more rework for 58% of employees,” Asana comments. To make things even worse, when agents make mistakes, “a third of workers (33%) say no one is responsible, while others scatter blame between IT, end users, or the agent’s creator. With no clear ownership, companies risk accumulating massive ‘AI debt’ and eroding trust.” Finally, approximately 82% of employees agree that “proper training is essential to use agents effectively, yet fewer than four in 10 companies provide it, leaving workers eager but unprepared to delegate beyond basic admin tasks.” So are AI agents really useful? If we look at the way they’re currently being implemented across workplaces, the short answer is no. But if we apply a digital transformation strategy and change management (which includes training), that’s when we will see the real ROI of agentic AI.
Databricks partners with OpenAI in a $100 million deal to deploy GPT-5 agents directly on customers’ enterprise data via SQL without data movement
Databricks Inc. and OpenAI have created a multiyear, $100 million partnership, making OpenAI’s latest models, including GPT-5, natively available to the more than 20,000 Databricks customers worldwide. Under the agreement, OpenAI’s models will be tightly integrated with Databricks’ AI development environment, called Agent Bricks. That provides organizations with a single platform to develop, evaluate and scale up AI agents — systems that can perform tasks autonomously with little or no human supervision — without the complexity of moving data or managing separate tools. Databricks customers will be able to run LLMs on their existing enterprise data, accessible via SQL or API, and deploy them securely at scale with built-in governance and observability controls. By keeping data within existing governance frameworks, businesses can deploy AI models while adhering to compliance and performance standards. The partnership also promises high-capacity processing power dedicated to running OpenAI’s models across customer workloads. Agent Bricks will play a central role in the joint offering. It allows organizations to measure model accuracy with task-specific evaluation methods, fine-tune LLMs for domain-specific tasks and automate workflows across a variety of use cases. With GPT-5 integrated, the companies said businesses can expect faster development cycles and more reliable AI outputs. Another key component of the partnership is Databricks’ Unity Catalog, which is used for data and AI model governance. It can help data lineage, control access and enforce compliance while scaling AI deployments across departments and geographies, Databricks said. Observability features also help teams monitor performance, accuracy and security.
Clarifai develops a new agentic AI acceleration engine using advanced hardware optimizations to double AI inference speed from existing GPU infrastructure and at 40% lower cost
AI platform Clarifai announced a new reasoning engine that it claims will make running AI models twice as fast and 40% less expensive. Designed to be adaptable to a variety of models and cloud hosts, the system employs a range of optimizations to get more inference power out of the same hardware. “It’s a variety of different types of optimizations, all the way down to CUDA kernels to advanced speculative decoding techniques,” said CEO Matthew Zeiler. “You can get more out of the same cards, basically.” The results were verified by a string of benchmark tests by the third-party firm Artificial Analysis, which recorded industry-best records for both throughput and latency. The process focuses specifically on inference, the computing demands of operating an AI model that has already been trained. That computing load has grown particularly intense with the rise of agentic and reasoning models, which require multiple steps in response to a single command.
GPT-5 triples GPT-4o performance from 13.7% to 40.6% across professional tasks while Claude Opus 4.1 reaches 49% expert-level performance
OpenAI released a new benchmark that tests how its AI models perform compared to human professionals across a wide range of industries and jobs. The test, GDPval, is an early attempt at understanding how close OpenAI’s systems are to outperforming humans at economically valuable work — a key part of the company’s founding mission to develop artificial general intelligence, or AGI. GDPval is based on nine industries that contribute the most to America’s gross domestic product, including domains such as healthcare, finance, manufacturing, and government. The benchmark tests an AI model’s performance in 44 occupations among those industries, ranging from software engineers to nurses to journalists. For OpenAI’s first version of the test, GDPval-v0, OpenAI asked experienced professionals to compare AI-generated reports with those produced by other professionals, and then choose the best one. For example, one prompt asked investment bankers to create a competitor landscape for the last-mile delivery industry and compare them to AI-generated reports. OpenAI then averages an AI model’s “win rate” against the human reports across all 44 occupations. For GPT-5-high, a souped-up version of GPT-5 with extra computational power, the company says the AI model was ranked as better than or on par with industry experts 40.6% of the time. It’s worth noting that most working professionals do a lot more than submit research reports to their boss, which is all that GDPval-v0 tests for. OpenAI acknowledges this and says it plans to create more robust tests in the future that can account for more industries and interactive workflows. OpenAI also tested Anthropic’s Claude Opus 4.1 model, which was ranked as better than or on par with industry experts in 49% of tasks. OpenAI says that it believes Claude scored so high because of its tendency to make pleasing graphics, rather than sheer performance.
OpenAI introduces proactive AI agent performing night-time research synthesis from chat history, calendar, emails and connected apps to anticipate user needs autonomously
OpenAI’s newest ChatGPT update brings more proactive agentic activities to the app, but one that automates a previous offering and widens its audience. ChatGPT Pulse surfaces personalized searches and updates to users, along with information from connected apps, such as their calendar. Pulse, currently on preview and available to Pro users, will be available on mobile. “This is the first step toward a more useful ChatGPT that proactively brings you what you need, helping you make more progress so you can get back to your life. We’ll learn and improve from early use before rolling it out to Plus, with the goal of making it available to everyone,” OpenAI said. While Pulse is currently targeting individual users, the feature could eventually lead to more intelligent agents from OpenAI. Enterprises are in the process of determining the best use cases for agents, and one of these is understanding how to leverage an agent that proactively performs tasks on behalf of users. Pulse generally conducts most of its work at night, performing asynchronous research on behalf of the user. “Each night, it synthesizes information from your memory, chat history and direct feedback to learn what’s most relevant to you, then delivers personalized focused updates the next day,” OpenAI said. Users can choose to connect apps like Gmail and Google Calendar, but OpenAI said the integrations “are off by default.” This allows ChatGPT to provide people with a rundown of their meetings the next day, draft a sample meeting agenda, or remind someone to buy a gift for a birthday. The company emphasized that users have control over what information Pulse gives them. A Pro user can tap curate on ChatGPT to guide Pulse on what they want to see and when. The idea is that Pulse can learn from this guidance to better anticipate users’ needs in the future. OpenAI added that “topics shown in Pulse also pass through safety checks to avoid showing harmful content that violates our policies.” Unless saved as a chat, each Pulse is only available for that day only. This shift – from a chat interface to a proactive, steerable AI assistant working alongside you –is how AI will unlock more opportunities for more people.
MIT spinoff Liquid AI launches task-specific Nano models that match GPT-4o performance while running locally on phones and laptops; Liquid delivers 50x cost reduction and 100x energy savings compared to cloud-hosted frontier model deployments
Liquid AI, a startup pursuing alternatives to the popular “transformer”-based AI models that have come to define the generative AI era, is announcing not one, not two, but a whole family of six different types of AI models called Liquid Nanos that it says are better suited to the “reality of most AI deployments” in enterprises and organizations than the larger foundation models from rivals like OpenAI, Google, and Anthropic. Liquid Nanos are task-specific foundation models that range from 350 million to 2.6 billion parameters, targeted towards enterprise deployments — basically, you can set and forget these things on enterprise-grade, field devices from laptops to smartphones to even sensor arrays and small robots. Liquid Nanos deliver performance that rivals far larger models on specialized, agentic workflows such as multilingual data extraction, translation, retrieval-augmented (RAG) question answering, low-latency tool and function calling, math reasoning, and more. By shifting computation onto devices rather than relying on cloud infrastructure, Liquid Nanos aim to improve speed, reduce costs, enhance privacy, and enable applications in enterprise and research-grade environments where connectivity or energy use is constrained. The first set of models in the Liquid Nanos lineup are designed for specialized use cases: LFM2-Extract: multilingual models (350M and 1.2B parameters) optimized for extracting structured data from unstructured text, such as converting emails or reports into JSON or XML. LFM2-350M-ENJP-MT: a 350M parameter model for bidirectional English toJapanese translation, trained on a broad range of text types. LFM2-1.2B-RAG: a 1.2B parameter model tuned for retrieval-augmented generation (RAG) pipelines, enabling grounded question answering over large document sets. LFM2-1.2B-Tool: a model specialized for precise tool and function calling, designed to run with low latency on edge devices without relying on longer reasoning chains. LFM2-350M-Math: a reasoning-oriented model aimed at solving challenging math problems efficiently, with reinforcement learning techniques used to control verbosity.Luth-LFM2 series: community-developed fine-tunes by Sinoué Gad and Maxence Lasbordes, specializing in French while preserving English capabilities. These models target specific tasks where small, fine-tuned architectures can match or even outperform generalist systems more than 100 billion parameters in size.
Factory unveils LLM-agnostic autonomous software development agents with enterprise context ingestion delivering 31x faster feature delivery and 96% migration time reduction
Agent-native software development startup Factory announced has raised $50 million in new funding and also fully released Droids, advanced agents designed to accelerate the shift toward agent-native development. While other AI coding platforms tie developers to a single integrated development environment, model or interface, Droids meet engineers where they already work, making adoption easier and more flexible. The Droids are LLM-agnostic and interface-agnostic, allowing developers to work from the terminal, IDE, Slack, Linear and browsers, or through custom scripts. The flexibility allows developers to adopt agents without rewriting workflows or abandoning existing tools. The system ingests organizational context and engineering tool data, including version control, issue trackers and incident systems, so that agents onboard like seasoned engineers and make consistent, context-aware decisions. The agents integrate with organizational tools like GitHub, Jira, Slack, Datadog and Google Drive and build a “mental model” of the codebase similar to a seasoned engineer. Droids also go beyond autocomplete to perform complex tasks such as feature development, refactoring, code review, documentation, incident response and codebase Q&A. Chief Executive Matan Grinberg added that “agents will not replace developers, but developers who are fluent with agents will rapidly outleverage and outpace developers who are not.” Factory already has an impressive lineup of customers, including Ernst & Young Global Ltd., Nvidia Corp., MongoDB Inc., Zapier Inc., Bayer AG and Clari Inc. Those customers are said by Factory to be seeing 31 times faster feature delivery, 96% shorter migration times, a 96% reduction in on-call resolution times, higher-quality code and more time for developers to focus on design and architecture.
