OpenAI is gearing up to release an AI system that’s truly “open,” meaning it’ll be available for download at no cost and not gated behind an API. Beyond its benchmark performance, OpenAI may have a key feature up its sleeve — one that could make its open “reasoning” model highly competitive. Company leaders have been discussing plans to enable the open model to connect to OpenAI’s cloud-hosted models to better answer complex queries. OpenAI CEO Sam Altman described the capability as a “handoff.” If the feature — as sources describe it — makes it into the open model, it will be able to make calls to the OpenAI API to access the company’s other, larger models for a substantial computational lift. It’s unclear if the open model will have the ability to access some of the many tools OpenAI’s models can use, like web search and image generation. The idea for the handoff feature was suggested by a developer during one of OpenAI’s recent developer forums, according to a source. The suggestion appears to have gained traction within the company. OpenAI has been hosting a series of community feedback events with developers to help shape its upcoming open model release. A local model that can tap into more powerful cloud systems brings to mind Apple Intelligence, Apple’s suite of AI capabilities that uses a combination of on-device models and models running in “private” data centers. OpenAI stands to benefit in obvious ways. Beyond generating incremental revenue, a handoff could rope more members of the open source community into the company’s premium ecosystem.
UiPath’s agentic AI platform to utilize Redis semantic routing tech that would enable AI agents to leverage the best LLM or LLM provider depending on the context, intent, and use-case which the customer is trying to solve
Data platform Redis and UiPath expanded their collaboration toward furthering agentic automation solutions for customers. By extending their partnership, Redis and UiPath will explore ways to leverage the Redis vector database, Semantic Caching, and Semantic Routing to support UiPath Agent Builder, a secure, simple way to build, test, and launch agents and the agentic automations they are executing. With Redis powering these solutions, UiPath agents will understand the meaning behind user queries, making data access faster and system responses smarter, which delivers greater speed and cost efficiency to enterprise developers looking to take advantage of automation. Additionally, via the utilization of semantic routing, UiPath agents will be able to leverage the best LLM or LLM provider depending on the context, intent, and use-case which the customer is trying to solve. UiPath Agent Builder builds on the RPA capabilities and orchestration of UiPath Automation Suite and Orchestrator to deliver unmatched agentic capabilities. Agent Builder will utilize a sophisticated memory architecture that enables agents to retrieve relevant information only from permissioned, governed knowledgebases and maintain context across planning and execution. This architecture will enable developers to create, customize, evaluate, and deploy specialized enterprise agents that can understand context, make decisions, and execute complex processes while maintaining enterprise-grade security and governance.
Microsoft releases taxonomy of failure modes- security and safety- inherent to agentic architecture- novel modes unique to agentic systems (e.g. agent compromise) and modes representing amplification of existing GenAI risks (e.g. bias amplification)
Microsoft’s AI Red Team has published a detailed taxonomy addressing the failure modes inherent to agentic architectures. Agentic AI systems are autonomous entities that observe and act upon their environment to achieve predefined objectives. These systems integrate capabilities such as autonomy, environment observation, interaction, memory, and collaboration. However, these features introduce a broader attack surface and new safety concerns. The report distinguishes between novel failure modes unique to agentic systems and amplification of risks already observed in generative AI contexts. Microsoft categorizes failure modes across security and safety dimensions. Novel Security Failures: Including agent compromise, agent injection, agent impersonation, agent flow manipulation, and multi-agent jailbreaks. Novel Safety Failures: Covering issues such as intra-agent Responsible AI (RAI) concerns, biases in resource allocation among multiple users, organizational knowledge degradation, and prioritization risks impacting user safety. Existing Security Failures: Encompassing memory poisoning, cross-domain prompt injection (XPIA), human-in-the-loop bypass vulnerabilities, incorrect permissions management, and insufficient isolation. Existing Safety Failures: Highlighting risks like bias amplification, hallucinations, misinterpretation of instructions, and a lack of sufficient transparency for meaningful user consent.
NeuroBlade’s Analytics Accelerator is a purpose-built hardware designed to handle modern database workloads delivering 4x faster performance than leading vectorized CPU implementations
As Elad Sity, CEO and cofounder of NeuroBlade, noted, “while the industry has long relied on CPUs for data preparation, they’ve become a bottleneck — consuming well over 30 percent of the AI pipeline.” NeuroBlade, the Israeli semiconductor startup Sity cofounded, believes the answer lies in a new category of hardware specifically designed to accelerate data analytics. Their Analytics Accelerator isn’t just a faster CPU — it’s fundamentally different architecture purpose-built to handle modern database workloads. NeuroBlade’s Accelerator unlocks the full potential of data analytics platforms by dramatically boosting performance and reducing query times. By offloading operations from the CPU to purpose-built hardware — a process known as pushdown—it increases the compute power of each server, enabling faster processing of large datasets with smaller clusters compared to CPU-only deployments. Purpose-built hardware that boosts each server’s compute power for analytics reduces the need for massive clusters and helps avoid bottlenecks like network overhead, power constraints, and operation complexity. In TPC-H benchmarks — a standard for evaluating decision support systems — Sity noted that the NeuroBlade Accelerator delivers about 4x faster performance than leading vectorized CPU implementations such as Presto-Velox. NeuroBlade’s pitch is that by offloading analytics from CPUs and handing them to dedicated silicon, enterprises can achieve better performance with a fraction of the infrastructure — lowering costs, energy draw and complexity in one move.
Bloomberg’s research reveals Retrieval-Augmented Generation (RAG) can produce unsafe responses; future designs must integrate safety systems that specifically anticipate how retrieved content might interact with model safeguards
According to surprising new research published by Bloomberg, RAG can potentially make large language models (LLMs) unsafe. Bloomberg’s paper, ‘RAG LLMs are Not Safer: A Safety Analysis of Retrieval-Augmented Generation for Large Language Models,’ evaluated 11 popular LLMs including Claude-3.5-Sonnet, Llama-3-8B and GPT-4o. The findings contradict conventional wisdom that RAG inherently makes AI systems safer. The Bloomberg research team discovered that when using RAG, models that typically refuse harmful queries in standard settings often produce unsafe responses. For example, Llama-3-8B’s unsafe responses jumped from 0.3% to 9.2% when RAG was implemented. Alongside the RAG research, Bloomberg released a second paper, ‘Understanding and Mitigating Risks of Generative AI in Financial Services,’ that introduces a specialized AI content risk taxonomy for financial services that addresses domain-specific concerns not covered by general-purpose safety approaches. The research challenges widespread assumptions that retrieval-augmented generation (RAG) enhances AI safety, while demonstrating how existing guardrail systems fail to address domain-specific risks in financial services applications. For enterprises looking to lead the way in AI, Bloomberg’s research mean that RAG implementations require a fundamental rethinking of safety architecture. Leaders must move beyond viewing guardrails and RAG as separate components and instead design integrated safety systems that specifically anticipate how retrieved content might interact with model safeguards. Industry-leading organizations will need to develop domain-specific risk taxonomies tailored to their regulatory environments, shifting from generic AI safety frameworks to those that address specific business concerns.
Lightrun’s observability platform can monitor code just as it is in the IDE and then automatically make adjustments to it as it moves into production through AI-based simulations
Startup Lightrun has built an observability platform to identify and debug (remediate) code. “Code is becoming cheap but bugs are expensive,” Ilan Peleg, CEO said. That problem, meanwhile, has reached “an inflection point. Developers now can ship more code than ever before,” due to all the automation that is being used, thanks to AI. “But it’s still a very manual process to fix it when things go wrong.” Lightrun’s breakthrough has been to build an observability toolset that can monitor code just as it is in the IDE and understand how it will behave alongside code that is actively in production. Lightrun is then able to automatically make adjustments to the code as it moves into production to continue operating without interruption and crashes. It does this by way of being able to create AI-based simulations to understand that behaviour, and then to fix the code before issues arise. “This is the part where we are unique,” Peleg said. There are a lot of options for how Lightrun might develop, given how close observability sits to other activities in organizations. One of those is building tools more specifically for cybersecurity teams, given the obvious security implications that arise out of bugs. Another is potentially building some of its tooling even closer to the point of code creation, to make finding and fixing possible bugs even more efficient.
OpenAI is rolling out shopping features such as improved product results, visual product details, pricing and reviews, and direct links to “find, compare and buy products” in ChatGPT
OpenAI said that it began rolling out features that make it easier and faster to “find, compare and buy products” in its ChatGPT chatbot. These features include improved product results; visual product details, pricing and reviews; and direct links to buy, according to a post on X. They will be available to Plus, Pro, Free and logged-out users. The rollout of this shopping experience began Monday and will take a few days to complete. “Product results are chosen independently and are not ads,” the post said. The new improvements outlined in other posts on X include the ability to send a WhatsApp message to ChatGPT to get up-to-date answers and live sports scores; the delivery of multiple citations with each response so that users can learn more or verify information; and the use of trending searches and autocomplete suggestions to make search faster.
Mastercard’s Agentic Payments Program applies tokenization to integrate trusted, seamless payments experiences into the tailored recommendations and insights already provided on conversational AI platforms
Mastercard announced the launch of its Agentic Payments Program, Mastercard Agent Pay. The groundbreaking solution integrates with agentic AI to revolutionize commerce. Mastercard Agent Pay will deliver smarter, more secure, and more personal payments experiences to consumers, merchants, and issuers. The program introduces Mastercard Agentic Tokens, which build upon proven tokenization capabilities that today power global commerce solutions like mobile contactless payments, secure card-on-file, and Mastercard Payment Passkeys, as well as programmable payments like recurring expenses and subscriptions. This helps unlock an agentic commerce future where consumers and businesses can transact with trust, security, and control. Mastercard will collaborate with Microsoft on new use cases to scale agentic commerce, with other leading AI platforms to follow. Mastercard will also partner with technology enablers like IBM, with its watsonx Orchestrate product, to accelerate B2B use cases. In addition, Mastercard will work with acquirers and checkout players like Braintree and Checkout.com to enhance the tokenization capabilities they are already using today with merchants to deliver safe, transparent agentic payments. For banks, tokenized payment credentials will be seamlessly integrated across agentic commerce platforms, keeping card issuers at the forefront of this rapidly evolving technology with enhanced visibility, security, and control. Mastercard Agent Pay will enhance generative AI conversations for people and businesses alike by integrating trusted, seamless payments experiences into the tailored recommendations and insights already provided on conversational platforms. By identifying and validating a customer using Mastercard’s tokenization technology, a retailer will be able to offer a meaningful and consistent shopping experience, layering on relevant and personalized benefits, such as recommended products, free delivery, rewards, and discounts. Mastercard will work with Microsoft to integrate Microsoft’s leading AI technologies, including Microsoft Azure OpenAI Service and Microsoft Copilot Studio, with Mastercard’s trusted payment solutions to develop and scale agentic commerce, addressing the evolving needs of the entire commerce value chain.
FTC order requires Workado to offer competent and reliable evidence to support the 98% accuracy and efficacy claims of its AI content detection product
The Federal Trade Commission issued a proposed order requiring Workado, LLC to stop advertising the accuracy of its AI detection products unless it maintains competent and reliable evidence showing those products are as accurate as claimed. The settlement will be subject to public comment before becoming final. The order settles allegations that Workado promoted its AI Content Detector as “98 percent” accurate in detecting whether text was written by AI or human. But independent testing showed the accuracy rate on general-purpose content was just 53 percent, according to the FTC’s administrative complaint. The FTC alleges that Workado violated the FTC Act because the “98 percent” claim was false, misleading, or non-substantiated. The proposed order settling the complaint is designed to ensure Workado does not engage in similar false, misleading, or unsupported advertising in the future. Under the proposed order, Workado: 1) Is prohibited from making any representations about the effectiveness of any covered product unless it is not misleading, and the company has competent and reliable evidence to support the claim at the time it is made; 2) Is required to retain any evidence it uses to support such efficacy claims; 3) Must email eligible consumers about the consent order and settlement with the Commission; and 4) Must submit compliance reports to the FTC one year after the order is issued and every year for the following three years.
Microsoft’s most capable new Phi 4 AI model rivals the performance of far larger systems, yet small enough for low-latency environments
Microsoft launched several new “open” AI models, the most capable of which is competitive with OpenAI’s o3-mini on at least one benchmark. All of the new pemissively licensed models — Phi 4 mini reasoning, Phi 4 reasoning, and Phi 4 reasoning plus — are “reasoning” models, meaning they’re able to spend more time fact-checking solutions to complex problems. Phi 4 mini reasoning was trained on roughly 1 million synthetic math problems generated by Chinese AI startup DeepSeek’s R1 reasoning model. Around 3.8 billion parameters in size, Phi 4 mini reasoning is designed for educational applications, like “embedded tutoring” on lightweight devices. Parameters roughly correspond to a model’s problem-solving skills, and models with more parameters generally perform better than those with fewer parameters. Phi 4 reasoning, a 14-billion-parameter model, was trained using “high-quality” web data as well as “curated demonstrations” from OpenAI’s o3-mini. It’s best for math, science, and coding applications. As for Phi 4 reasoning plus, it’s Microsoft’s previously released Phi-4 model adapted into a reasoning model to achieve better accuracy on particular tasks. Phi 4 reasoning plus approaches the performance levels of R1, a model with significantly more parameters (671 billion). The company’s internal benchmarking also has Phi 4 reasoning plus matching o3-mini on OmniMath, a math skills test. “Using distillation, reinforcement learning, and high-quality data, these [new] models balance size and performance,” wrote Microsoft in a blog post. “They are small enough for low-latency environments yet maintain strong reasoning capabilities that rival much bigger models. This blend allows even resource-limited devices to perform complex reasoning tasks efficiently.”