AI agent and assistant platform provider Vectara launched a new Hallucination Corrector directly integrated into its service, designed to detect and mitigate costly, unreliable responses from enterprise AI models. In its initial testing, Vectara said the Hallucination Corrector reduced hallucination rates in enterprise AI systems to about 0.9%. The HHEM scores the answer against the source with a probability score between 1 and 0, where 0 means completely inaccurate – a total hallucination – and 1 for perfect accuracy. HHEM is available on Hugging Face and received over 250,000 downloads last month, making it one of the most popular hallucination detectors on the platform. In the case of a factually inconsistent response, the Corrector provides a detailed output including an explanation of why the statement is a hallucination and a corrected version incorporating minimal changes for accuracy. The company automatically uses the corrected output in summaries for end-users, but experts can use the full explanation and suggested fixes for testing applications to refine or fine-tune their models and guardrails to combat hallucinations. It can also show the original summary but use corrections info to flag potential uses while offering the corrected summary as an optional fix. In the case of LLM answers that fall into the category of misleading but not quite outright false, the Hallucination Corrector can work to refine the response to reduce its uncertainty core according to the customer’s settings.
LatticeFlow AI’s risk evaluation service delivers independent evaluations of LLMs using benchmarks tailored to real-world, business-oriented requirements for secure and compliant adoption of gen AI
LatticeFlow AI has launched AI Insights, the first independent LLM risk evaluation service for secure business adoption. AI Insights gives AI and governance, risk, and compliance (GRC) leaders clear, actionable intelligence on enabling fast, secure and confident adoption of foundation models. AI Insights sets a new standard, favoring transparency, independence, and real-world relevance over leaderboard rankings and performance metrics. It’s designed to provide enterprise leaders independent, trustworthy, and business-oriented evaluations that support secure and compliant AI adoption. AI Insights delivers independent evaluations of foundation models using the most comprehensive set of benchmarks tailored to real-world business requirements, covering security, fairness, and regulatory alignment. Each evaluation provides clear, actionable recommendations to support secure and compliant generative AI adoption. The results are presented in intuitive reports that explain model behavior, flag critical issues like bias or prompt vulnerabilities, and offer mitigation recommendations. AI Insights offers a new model, one that prioritizes transparency over leaderboard hype, and business requirements over performance points. Dr. Petar Tsankov, CEO and Co-founder of LatticeFlow AI said “AI Insights enables organizations to accelerate AI adoption by ensuring secure and compliant AI deployment.”
Inflectra’s cloud-native generative AI engine to be natively integrated into its software development platforms unlike conventional ‘bolt-on’ AI to offer real-time support and dynamic test automation
Inflectra announced the general availability of Inflectra.ai, its natively integrated generative AI engine designed to accelerate software delivery, improve quality, and optimize development throughput. Inflectra.ai delivers AI capabilities directly within Inflectra’s cloud platforms — starting with Spira — enabling teams to automate routine processes, generate key artifacts, and enhance decision-making without leaving their existing tools or introducing additional overhead. Unlike conventional “bolt-on” AI features, Inflectra.ai is deeply embedded within the fabric of Inflectra’s Software Project Management platforms: SpiraTest, SpiraTeam, and SpiraPlan, and is expected to expand into Rapise later in 2025. Built as a cloud-native and context-aware intelligence layer, Inflectra.ai delivers real-time support across the software lifecycle. Core Capabilities Include: Intelligent Generation of test cases, BDD scenarios, risks, and user stories from structured and unstructured inputs; Dynamic Test Automation that adapts to UI changes without manual rework; Risk Identification and prioritization at the point of planning and analysis; Seamless Contextual Assistance embedded within the Spira UI, aligned to user workflows.
Databricks to integrate Neon’s serverless Postgres architecture to enable developers to deploy AI agents without requiring to scale compute and storage in tandem
Databricks announced its intent to acquire Neon, a leading serverless Postgres company. Databricks plans to continue innovating and investing in Neon’s database and developer experience for existing and new Neon customers and partners. Together, Databricks and Neon will work to remove the traditional limitations of databases that require compute and storage to scale in tandem — an inefficiency that hinders AI workloads. The integration of Neon’s serverless Postgres architecture with the Databricks Data Intelligence Platform will help developers and enterprise teams efficiently build and deploy AI agent systems. This approach not only prevents performance bottlenecks from thousands of concurrent agents but also simplifies infrastructure, reduces costs and accelerates innovation — all with Databricks’ security, governance and scalability at the core. Together, Neon and Databricks will empower organizations to eliminate data silos, simplify architecture and build AI agents that are more responsive, reliable and secure.
OpenAI releases GPT-4.1 models which are faster than GPT-4o and excel at coding and instruction following; but with a different set of safety evaluations
OpenAI is releasing its GPT-4.1 and GPT-4.1 mini AI models in ChatGPT. The GPT-4.1 models should help software engineers who are using ChatGPT to write or debug code, OpenAI spokesperson Shaokyi Amdo told TechCrunch. GPT-4.1 excels at coding and instruction following compared to GPT-4o, according to OpenAI, but is faster than its o-series of reasoning models. The company says it’s now rolling out GPT-4.1 to subscribers to ChatGPT Plus, Pro, and Team. Meanwhile, OpenAI is releasing GPT-4.1 mini for free and paying users of ChatGPT. As a result of this update, OpenAI is removing GPT-4.0 mini from ChatGPT for all users. “GPT-4.1 doesn’t introduce new modalities or ways of interacting with the model, and doesn’t surpass o3 in intelligence,” said OpenAI’s Head of Safety Systems Johannes Heidecke in a post. “This means that the safety considerations here, while substantial, are different from frontier models.” Now, OpenAI is releasing more information about GPT-4.1 and all its AI models. OpenAI has committed to publish the results of its internal AI model safety evaluations more frequently as part of an effort to increase transparency. Those results will live in OpenAI’s new Safety Evaluations Hub, which it launched on Wednesday.
Capgemini’s mainframe modernization offering automates legacy code analysis and extraction of business rules using a set of generative AI agents
Capgemini has launched a new offering that enables organizations to unlock greater value from their legacy systems at unprecedented speed and accuracy. The new approach, powered by generative and agentic AI, allows organizations to gain cost savings, agility, and a significant improvement in data quality. It converts legacy mainframe applications into modern, agile, and cloud-friendly formats that can run more efficiently either on or outside of a mainframe. Capgemini’s automated mainframe application refactoring uses tools and techniques to automatically convert legacy mainframe applications, such as those written in COBOL, into modern architecture. The approach is supported by rigorous automated testing for faster, higher-quality transformations and reduced risk for businesses. Capgemini’s experience in delivering large and complex mainframe modernization programs, market leadership in AI, deep domain knowledge, and broad understanding of complex industry regulations has already delivered tangible results for blue-chip clients.
Boomi and AWS partner to offer a centralized management solution for deploying, monitoring, and governing AI agents across hybrid and multi-cloud environments with built-in support for MCP via a single API
Boomi announced a multi-year Strategic Collaboration Agreement (SCA) with AWS to help customers build, manage, monitor and govern Gen AI agents across enterprise operations. Additionally, the SCA will aim to help customers accelerate SAP migrations from on-premises to AWS. By integrating Amazon Bedrock with the Boomi Agent Control Tower, a centralized management solution for deploying, monitoring, and governing AI agents across hybrid and multi-cloud environments, customers can easily discover, build, and manage agents executing in their AWS accounts, while also maintaining visibility and control over agents running in other cloud provider or third-party environments. Through a single API, Amazon Bedrock provides a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI in mind, including support for Model Context Protocol (MCP), a new open standard that enables developers to build secure, two-way connections between their data and AI-powered tools. MCP enables agents to effectively interpret and work with ERP data while complying with data governance and security requirements. Steve Lucas, Chairman and CEO at Boomi. “By integrating Amazon Bedrock’s powerful generative AI capabilities with Boomi’s Agent Control Tower, we’re giving organizations unprecedented visibility and control across their entire AI ecosystem while simultaneously accelerating their critical SAP workload migrations to AWS. This partnership enables enterprises to confidently scale their AI initiatives with the security, compliance, and operational excellence their business demands.” Apart from Agent Control Tower, the collaboration will introduce several strategic joint initiatives, including: Enhanced Agent Designer; and New Native AWS Connectors and Boomi for SAP.
AlphaEvolve coding agent built on Google’s Gemini LLMs, tests, refines, and improves algorithms automatically
Google DeepMind today pulled the curtain back on AlphaEvolve, an artificial-intelligence agent that can invent brand-new computer algorithms — then put them straight to work inside the company’s vast computing empire. AlphaEvolve pairs Google’s Gemini LLMs with an evolutionary approach that tests, refines, and improves algorithms automatically. The system has already been deployed across Google’s data centers, chip designs, and AI training systems — boosting efficiency and solving mathematical problems that have stumped researchers for decades. “AlphaEvolve is a Gemini-powered AI coding agent that is able to make new discoveries in computing and mathematics,” explained Matej Balog, a researcher at Google DeepMind. “It can discover algorithms of remarkable complexity — spanning hundreds of lines of code with sophisticated logical structures that go far beyond simple functions.” One algorithm it discovered has been powering Borg, Google’s massive cluster management system. This scheduling heuristic recovers an average of 0.7% of Google’s worldwide computing resources continuously — a staggering efficiency gain at Google’s scale. The discovery directly targets “stranded resources” — machines that have run out of one resource type (like memory) while still having others (like CPU) available. AlphaEvolve’s solution is especially valuable because it produces simple, human-readable code that engineers can easily interpret, debug, and deploy. Perhaps most impressively, AlphaEvolve improved the very systems that power itself. It optimized a matrix multiplication kernel used to train Gemini models, achieving a 23% speedup for that operation and cutting overall training time by 1%. For AI systems that train on massive computational grids, this efficiency gain translates to substantial energy and resource savings.
Broadridge Financial Solutions awarded patent for its LLM orchestration of machine learning agents used in its AI bond trading platform; patented features include explainability of the output generated, compliance verification and user profile attributes
Broadridge Financial Solutions has been awarded a U.S. patent for its large language model orchestration of machine learning agents, which is used in BondGPT and BondGPT+. These applications provide timely, secure, and accurate responses to natural language questions using OpenAI GPT models and multiple AI agents. The BondGPT+ enterprise application integrates clients’ proprietary data, third-party datasets, and personalization features, improving efficiency and saving time for users. Broadridge continues to work closely with clients to integrate AI into their workflows. Other significant features patented in U.S. Patent No. 11,765,405 include: Explainability as to how the output of the patented methods of LLM orchestration of machine learning agents was generated through a “Show your work” feature that offers step-by-step transparency; A multi-agent adversarial feature for enhanced accuracy; and An AI-powered compliance verification feature, based on custom compliance rules configured to an enterprise’s unique compliance and risk management processes. The use of User Profile attributes such as user role to inform data retrieval and security.
OpenAI’s new Codex carries out coding tasks in isolated software containers that don’t have web access and allows developers to customize those production environments, review the code and fix bugs with an accuracy rate of 75%
OpenAI debuted a new AI agent, Codex, that can help developers write code and fix bugs. The tool is available through a sidebar in ChatGPT’s interface. One button in the sidebar configures Codex to generate new code based on user instructions, while another allows it to answer questions about existing code. Prompt responses take between one and 30 minutes to generate based on the complexity of the request. Codex is powered by a new AI model called codex-1. It’s a version of o3, OpenAI’s most capable reasoning model, that has been optimized for programming tasks. The ChatGPT developer fine-tuned Codex by training it on a set of real-world coding tasks. Those tasks involved a range of software environments. A piece of software that runs well in one environment, such as a cloud platform, may not run as efficiently on a Linux server or a developer’s desktop, if at all. As a result, an AI model’s training dataset must include technical information about every environment that it will be expected to use. OpenAI used reinforcement learning to train codex-1. It’s a way of developing AI models that relies on trial and error to boost output quality. When a neural network completes a task correctly, it’s given a virtual reward, while incorrect answers lead to penalties that encourage the algorithm to come up with a better approach. In a series of coding tests carried out by OpenAI, Codex achieved an accuracy rate of 75%. That’s 5% better than the most capable, hardware-intensive version of o3. OpenAI’s first-generation reasoning model, o1, scored 11%. Codex carries out coding tasks in isolated software containers that don’t have web access. According to OpenAI, the agent launches a separate container for each task. Developers can customize those development environments by uploading a text file called AGENTS.md. The file may describe what programs Codex should install, how AI-generated code should be tested for bugs and related details. Using AGENTS.md, developers can ensure that the container in which Codex generates code is configured the same way as the production system on which the code will run. That reduces the need to modify the code before releasing it to production. Developers can monitor Codex while it’s generating code. After the tool completes a task, it provides technical data that can be used to review each step of the workflow. It’s possible to request revisions if the code doesn’t meet project requirements.