OpenAI is releasing a new version of GPT-5 to its AI coding agent, Codex. The company says its new model, called GPT-5-Codex, spends its “thinking” time more dynamically than previous models and could spend anywhere from a few seconds to seven hours on a coding task. As a result, it performs better on agentic coding benchmarks. The new model is now rolling out in Codex products — which can be accessed via a terminal, IDE, GitHub, or ChatGPT — to all ChatGPT Plus, Pro, Business, Edu, and Enterprise users. OpenAI says it plans to make the model available to API customers in the future. OpenAI says that GPT-5-Codex outperforms GPT-5 on SWE-bench Verified, a benchmark measuring agentic coding abilities, as well as a benchmark measuring performance on code refactoring tasks from large, established repositories. The company also says it trained GPT-5-Codex for conducting code reviews and asked experience software engineers to evaluate the model’s review comments. The engineers reportedly found GPT-5-Codex to submit fewer incorrect comments, while adding more “high-impact comments.” OpenAI’s Codex product lead Alexander Embiricos said GPT-5-Codex works similarly but has no router under the hood and can adjust for how long to work on a task in real time. Embiricos says this is an advantage compared to a router, which decides how much computational power and time to use on a problem at the outset. Instead, GPT-5-Codex can decide five minutes into a problem that it needs to spend another hour. Embiricos said he’s seen the model take upward of seven hours in some cases.
GPT-5-Codex automates coding tasks with self-correction, outperforms GPT by 17% on benchmarks, available in OpenAI’s paid plans with API access coming
OpenAI introduced a new AI model, GPT-5-Codex, that it says can complete hours-long programming tasks without user assistance. The algorithm is an improved version of GPT-5 trained on additional coding data. It’s accessible through Codex, an AI programming tool included in paid ChatGPT plans. OpenAI says that GPT-5-Codex is better than its predecessor at complex, time-consuming programming tasks. “During testing, we’ve seen GPT‑5-Codex work independently for more than 7 hours at a time,” OpenAI staffers detailed in a blog post. GPT-5-Codex spots mistakes it makes during long coding sessions and fixes them automatically. According to OpenAI, the model’s ability to tackle time-consuming tasks makes it particularly useful for refactoring. That’s the process of changing an application’s code base not for the purpose of adding features but rather to improve its quality. Developers might, for example, wish to reduce a code snippet’s memory usage or boost response times. OpenAI evaluated GPT-5-Codex’s capabilities using an internally-developed refactoring benchmark. The model scored 51.3%, outperforming GPT by more than 17%. GPT-5-Codex can adjust the amount of time it spends on task based on its difficulty. As a result, the model processes simple requests significantly faster than GPT-5. “That means Codex will feel snappier on small, well-defined requests or while you are chatting with it,” the OpenA staffers wrote. The ChatGPT developer had employees send coding requests to GPT-5-Codex and ranked those requests based on their model-generated token counts, a measure of hardware usage. According to OpenAI, the bottom 10% used 93.7% fewer tokens than GPT‑5. The most complicated coding prompts, in contrast, cause GPT-5-Codex to spend significantly more time reasoning than GPT-5.
Thomson Reuters’ Deep Research uses multi‑agents on a 20 billion document corpus, cutting 20‑hour legal research to 10 minutes with direct citations for enterprise compliance
Thomson Reuters Westlaw’s Deep Research platform was specifically designed to take its time, working an average of 10 minutes. This allows the multi-step research agent to plan, execute and pull from a deep, curated dataset of more than 20 billion docs — up-to-date case law, statutes, administrative rulings, secondary sources and structured legal editorial content. On the back end, it connects with a highly developed toolset that attorneys can use to check findings and probe deeper into legal scenarios. Deep Research is designed to eliminate errors and hallucinations, providing direct citations from Thomson Reuters’ vast dataset. The result is an AI agent that mirrors the rigor of human legal research, providing legal nuance and reducing the time attorneys spend on discovery. For enterprises beyond law, the system offers a blueprint for how AI can move past speed into substance, indicating that slowing AI down can provide real business value. Deep Research on CoCounsel is embedded into Westlaw, Thomson Reuters’ legal research platform used by 12,000-plus law firms, more than 4,000 corporate legal departments and the majority of the top U.S. courts and law firms. Typically, lawyers can spend 10 to 20 hours performing research for complex legal matters. While Westlaw doesn’t yet have definitive numbers, Deep Research is speeding up that time “dramatically,” while also surfacing relevant materials to advise their clients, produce better briefs and motions and litigate more effectively. While Deep Research’s default option is 10 minutes, there are seven-minute and three-minute versions available; the team is also working on a longer 20-minute version. While devs or researchers in the lab are often looking to make models faster and faster, lawyers aren’t seeking instant gratification; they actually prefer longer output options.
Socotra’s MCP Server standardizes AI connectivity using Anthropic’s MCP spec; enabling policy aware authorization, human-in-loop governance and vendor‑neutral LLM switching for insurers via documented, 10‑minute deployment
Socotra released its Model Context Protocol (MCP) Server, the most mature MCP server in the insurance industry. The new offering allows insurers to quickly and safely connect agentic AI to Socotra Insurance Suite, unlocking automation for insurance workflows. Socotra MCP Server is readily available to customers and includes 10-minute step-by-step instructions for connecting to popular AI platforms Claude, Cursor, and Visual Studio Code. “Every component of Socotra was built from the beginning for AI connectivity,” said Sonny Patel, Chief Product and Technology Officer at Socotra. “Socotra MCP Server simplifies deployment of agentic AI, so that insurers can increase productivity, reduce expenses, improve loss ratios, and offer their customers the very best products and support.” Many insurance operations are complex, regulated, and labor-intensive. Socotra MCP Server directly addresses these challenges by: Enabling AI agents to execute workflows with speed and accuracy, through well-defined MCP tools; Protecting sensitive policyholder data with capability-scoped authentication, encrypted agent sessions, and policy-aware authorization based on Anthropic’s latest MCP specification; Delivering enterprise-grade governance and auditability where every AI action is logged, permissioned, and traceable—with human-in-the-loop checkpoints; Preventing vendor lock-in as AI technology advances and the vendor landscape shifts. Insurers can easily switch AI applications and connect their own custom LLMs.
Pulumi’s AI platform engineer automates end-to-end infrastructure lifecycle through multi-cloud IaC foundation supporting thousands of providers with human-in-the-loop approval workflows and enterprise guardrails
Pulumi announced Pulumi Neo, the industry’s first platform engineering AI agent purpose-built to manage infrastructure on any cloud—public, private, or hybrid—with enterprise-grade controls. Neo builds on Pulumi’s proven cloud engineering platform and flagship infrastructure as code (IaC) technology, which supports thousands of cloud providers and powers over one million downloads per week. Neo is an AI agent that can automate infrastructure tasks end-to-end with enterprise governance. Neo understands dependencies, executes changes, monitors outcomes, and maintains compliance throughout the entire infrastructure lifecycle. Neo appears as a teammate in Pulumi Cloud who you can ask to perform jobs and know it will respect your security model. Pulumi Neo’s capabilities include: Fully agentic workflow: Launch short- or long-running tasks with approvals, interactive guidance, and complete task history. Deep IaC foundation: Get previews and history for everything Neo does, leveraging proven patterns from thousands of providers and millions of production deployments. Personalized multi-cloud context: Automatic context across all of your cloud environments for all of Pulumi’s supported providers, including AWS, Azure, Google Cloud, and Kubernetes. Automatic enterprise guardrails: Neo respects all of your team’s governance settings, including RBAC and policy as code configurations. Progressive autonomy: Human-in-the-loop interaction with configurable automation levels, from fully guardrailed to fully autonomous. Beta customers report transformational improvements: Deliver 10x more infrastructure with existing teams; Deploy 75% faster for operations that previously took weeks; Reduce policy violations by 90% through automated governance.
Pipe17’s MCP server bridges order operations data with AI tools through client-server architecture; enabling automated order troubleshooting via function calls and real-time inventory management workflows.
Pipe17, the AI-native Order Operations Platform, announced the industry’s first Model Context Protocol (MCP) server purpose-built for order management. Pipe17 MCP Server extends connectivity and order management into any MCP-enabled environment, including AI assistants such as Claude, Gemini, and ChatGPT, allowing AI to directly access all of the data and functionality within Pipe17. By exposing order-related data via MCP, customers can: Marry order data with AI. Once order data is accessible through the Pipe17 MCP Server, AI can analyze it, generate code, and build custom apps and workflows, putting the broader AI ecosystem to work. Troubleshoot stuck orders. Business users can ask AI to perform tasks that involve dozens of MCP tools and resources. For example – retrieving an order, finding out why it’s stuck, and then fixing it requires approximately 11 Pipe17 MCP function and resource calls. AI magically does it for you, without any intervention. Make agentic commerce real. Expose an endpoint so orders can be consumed from any selling channel, whether human-driven or AI-driven. 3PL leaders are also validating the approach. Anthony Hockaday, Sr. Director of Product Management at Radial, said, “The Pipe17 MCP Server represents the future of how 3PLs will operate. By making order data instantly accessible through AI, it transforms the way we serve brands. Our clients gain intelligence and clarity at the speed of conversation, which will fundamentally change how logistics partners build trust and scale.” “With Pipe17, order data is no longer locked away in a SaaS app. Connected through MCP, it becomes accessible to the trillion-dollar AI ecosystem, giving every enterprise the ability to make agentic commerce a reality,” said Mo Afshar, CEO of Pipe17.
QASolve’s no-code AI-powered QA platform reduces testing workload by 70% through machine learning-driven test case generation and automated CI/CD pipeline synchronization with GitHub and Jenkins workflows
ASolve AI, a startup, has launched an AI-powered software testing platform designed to save development teams time, cut costs, and improve software quality. By automating the repetitive and time-consuming parts of quality assurance, QASolve AI helps companies catch bugs earlier, shorten release cycles, and scale testing without hiring additional staff. Hiring a new QA engineer can take up to six weeks, with salaries ranging from $85,000 to $120,000 annually. QASolve AI tackles this challenge with a platform that reduces testing time by 50 to 80 percent. Using AI-powered bug detection, self-healing tests, and a no-code interface, the platform helps teams maintain quality while releasing products faster. QASolve AI also integrates seamlessly with widely used CI/CD tools like GitHub and Jenkins, making it easy to automate QA without disrupting established workflows. An engineering leader at an e-commerce company added, “We cut our testing workload by 70 percent in the first month of using QASolve. It feels like we added another QA engineer to the team without the hiring process or the extra cost.” The no-code platform supports web and desktop applications, allowing companies of any size to test across multiple environments without additional setup. For startups, QASolve AI provides affordable entry-level pricing and free trials to get started quickly. For larger enterprises, the company offers advanced security options, multi-environment support, and dedicated onboarding. By focusing on AI-driven automation, QASolve AI positions itself at the intersection of two high-growth markets: artificial intelligence and software development tools.
Causify combines causal reasoning with Bayesian probabilistic forecasting enabling hedge funds to deliver explainable business outcome predictions
AI company Causify has raised an additional $2.2 million in seed funding to help enterprises turn data into decisive action. By combining causal reasoning with Bayesian probabilistic forecasting, Causify delivers predictions that are explainable, actionable, and profitable. The implications are profound: businesses can now understand why outcomes occur and take precise actions that drive millions in value. Causify has already shown its ability to outperform industry standards in fields as diverse as wind energy. In seven short weeks, the Causify team and its prediction engine beat existing tools built by seasoned industry veterans, proving its ability not only to see what others miss but also to explain why failures occur. Causify’s causal AI models have also proven themselves in several other markets, including finance, supply chain/fleet management, energy, and predictive maintenance. Hedge funds, equity trading firms, and crypto multi-manager firms use Causify to optimize portfolios, detect market signals, and drive risk-adjusted returns. Fleet operators like Xerox rely on Causify to predict when and where spare parts will be needed, cutting downtime and eliminating costly overstocking. Causify also delivers a 7-day predictive edge in volatile energy markets, powering trading, scheduling, and profitability strategies. Beyond wind turbines, Causify provides real-time health monitoring and early failure detection across industrial assets.
Zencoder launches unified AI development platform integrating OpenAI Codex, Claude Code, and Gemini CLI within Visual Studio Code and JetBrains IDEs
Zencoder, the AI-powered software development agents, announced the expansion of its Zen Platform, unifying the world’s most popular AI coding tools—including OpenAI Codex, Anthropic’s Claude Code, Gemini CLI, and Zencoder’s Zen CLI—directly inside modern IDEs like Visual Studio Code and JetBrains. With nearly a billion global subscribers to leading AI platforms such as ChatGPT, Gemini, and Claude, users everywhere now have a powerful and seamless way to build and modify applications at scale. Thanks to Zencoder’s elegant IDE user interface and interoperability with command line tools from OpenAI, Google, and Anthropic, Zencoder is bringing “vibe coding” to the masses: intuitive, fluid, and enterprise-ready. “For the first time, developers don’t need to choose between powerful CLIs, IDE integration, or enterprise capabilities,” said Andrew Filev, CEO and Founder of Zencoder. “We’re eliminating tool silos and making AI-assisted development accessible to everyone, from start-ups to enterprise teams alike.” A New Era of AI Development: Universal Compatibility: Developers can code using Codex, Claude Code, Gemini, or Zen CLI—seamlessly switching in IDEs with full support for debugging, refactoring, and IntelliSense. Free to Start: Anyone can download the VS Code or JetBrains IDE extension, connect it to their existing ChatGPT, Gemini, or Claude subscription, and start coding —no strings attached. Enterprise Differentiation: Zencoder’s multi-repository intelligence, shareable Zen agents, deep analytics, and enterprise guardrails empower organizations to adopt AI securely and at scale.
Apple’s on-device Foundation Models framework delivers privacy-first AI integration for iOS 26 developers using 3 billion parameter models along with tool calling support
As iOS 26 is rolling out to all users, developers have been updating their apps to include features powered by Apple’s local AI models. The Lil A rtist app offers various interactive experiences to help kids learn different skills like creativity, math, and music. An AI story creator with the iOS 26 update allows users to select a character and a theme, with the app generating a story using AI. The text generation in the story is powered by the local model. The developer of the Daylish app is working on a prototype for automatically suggesting emojis for timeline events based on the title for the daily planner app. Finance tracking app MoneyCoach has two neat features powered by local models. First, the app shows insights about your spending, such as whether you spent more than average on groceries for that particular week. The other feature automatically suggests categories and subcategories for a spending item for quick entries. The word learning app LookUp has added two new modes using Apple’s AI models. There is a new learning mode, which leverages a local model to create examples corresponding to a word. The developer is also using on-device models to generate a map view of a word’s origin. The Tasks app implemented a feature to suggest tags for an entry using local models automatically. It’s also using these models to detect a recurring task and schedule it accordingly. And the app lets users speak a few things and use the local model to break them down into various tasks without using the internet.
