Microsoft’s AI Red Team has published a detailed taxonomy addressing the failure modes inherent to agentic architectures. Agentic AI systems are autonomous entities that observe and act upon their environment to achieve predefined objectives. These systems integrate capabilities such as autonomy, environment observation, interaction, memory, and collaboration. However, these features introduce a broader attack surface and new safety concerns. The report distinguishes between novel failure modes unique to agentic systems and amplification of risks already observed in generative AI contexts. Microsoft categorizes failure modes across security and safety dimensions. Novel Security Failures: Including agent compromise, agent injection, agent impersonation, agent flow manipulation, and multi-agent jailbreaks. Novel Safety Failures: Covering issues such as intra-agent Responsible AI (RAI) concerns, biases in resource allocation among multiple users, organizational knowledge degradation, and prioritization risks impacting user safety. Existing Security Failures: Encompassing memory poisoning, cross-domain prompt injection (XPIA), human-in-the-loop bypass vulnerabilities, incorrect permissions management, and insufficient isolation. Existing Safety Failures: Highlighting risks like bias amplification, hallucinations, misinterpretation of instructions, and a lack of sufficient transparency for meaningful user consent.
NeuroBlade’s Analytics Accelerator is a purpose-built hardware designed to handle modern database workloads delivering 4x faster performance than leading vectorized CPU implementations
As Elad Sity, CEO and cofounder of NeuroBlade, noted, “while the industry has long relied on CPUs for data preparation, they’ve become a bottleneck — consuming well over 30 percent of the AI pipeline.” NeuroBlade, the Israeli semiconductor startup Sity cofounded, believes the answer lies in a new category of hardware specifically designed to accelerate data analytics. Their Analytics Accelerator isn’t just a faster CPU — it’s fundamentally different architecture purpose-built to handle modern database workloads. NeuroBlade’s Accelerator unlocks the full potential of data analytics platforms by dramatically boosting performance and reducing query times. By offloading operations from the CPU to purpose-built hardware — a process known as pushdown—it increases the compute power of each server, enabling faster processing of large datasets with smaller clusters compared to CPU-only deployments. Purpose-built hardware that boosts each server’s compute power for analytics reduces the need for massive clusters and helps avoid bottlenecks like network overhead, power constraints, and operation complexity. In TPC-H benchmarks — a standard for evaluating decision support systems — Sity noted that the NeuroBlade Accelerator delivers about 4x faster performance than leading vectorized CPU implementations such as Presto-Velox. NeuroBlade’s pitch is that by offloading analytics from CPUs and handing them to dedicated silicon, enterprises can achieve better performance with a fraction of the infrastructure — lowering costs, energy draw and complexity in one move.
Bloomberg’s research reveals Retrieval-Augmented Generation (RAG) can produce unsafe responses; future designs must integrate safety systems that specifically anticipate how retrieved content might interact with model safeguards
According to surprising new research published by Bloomberg, RAG can potentially make large language models (LLMs) unsafe. Bloomberg’s paper, ‘RAG LLMs are Not Safer: A Safety Analysis of Retrieval-Augmented Generation for Large Language Models,’ evaluated 11 popular LLMs including Claude-3.5-Sonnet, Llama-3-8B and GPT-4o. The findings contradict conventional wisdom that RAG inherently makes AI systems safer. The Bloomberg research team discovered that when using RAG, models that typically refuse harmful queries in standard settings often produce unsafe responses. For example, Llama-3-8B’s unsafe responses jumped from 0.3% to 9.2% when RAG was implemented. Alongside the RAG research, Bloomberg released a second paper, ‘Understanding and Mitigating Risks of Generative AI in Financial Services,’ that introduces a specialized AI content risk taxonomy for financial services that addresses domain-specific concerns not covered by general-purpose safety approaches. The research challenges widespread assumptions that retrieval-augmented generation (RAG) enhances AI safety, while demonstrating how existing guardrail systems fail to address domain-specific risks in financial services applications. For enterprises looking to lead the way in AI, Bloomberg’s research mean that RAG implementations require a fundamental rethinking of safety architecture. Leaders must move beyond viewing guardrails and RAG as separate components and instead design integrated safety systems that specifically anticipate how retrieved content might interact with model safeguards. Industry-leading organizations will need to develop domain-specific risk taxonomies tailored to their regulatory environments, shifting from generic AI safety frameworks to those that address specific business concerns.
Lightrun’s observability platform can monitor code just as it is in the IDE and then automatically make adjustments to it as it moves into production through AI-based simulations
Startup Lightrun has built an observability platform to identify and debug (remediate) code. “Code is becoming cheap but bugs are expensive,” Ilan Peleg, CEO said. That problem, meanwhile, has reached “an inflection point. Developers now can ship more code than ever before,” due to all the automation that is being used, thanks to AI. “But it’s still a very manual process to fix it when things go wrong.” Lightrun’s breakthrough has been to build an observability toolset that can monitor code just as it is in the IDE and understand how it will behave alongside code that is actively in production. Lightrun is then able to automatically make adjustments to the code as it moves into production to continue operating without interruption and crashes. It does this by way of being able to create AI-based simulations to understand that behaviour, and then to fix the code before issues arise. “This is the part where we are unique,” Peleg said. There are a lot of options for how Lightrun might develop, given how close observability sits to other activities in organizations. One of those is building tools more specifically for cybersecurity teams, given the obvious security implications that arise out of bugs. Another is potentially building some of its tooling even closer to the point of code creation, to make finding and fixing possible bugs even more efficient.
OpenAI is rolling out shopping features such as improved product results, visual product details, pricing and reviews, and direct links to “find, compare and buy products” in ChatGPT
OpenAI said that it began rolling out features that make it easier and faster to “find, compare and buy products” in its ChatGPT chatbot. These features include improved product results; visual product details, pricing and reviews; and direct links to buy, according to a post on X. They will be available to Plus, Pro, Free and logged-out users. The rollout of this shopping experience began Monday and will take a few days to complete. “Product results are chosen independently and are not ads,” the post said. The new improvements outlined in other posts on X include the ability to send a WhatsApp message to ChatGPT to get up-to-date answers and live sports scores; the delivery of multiple citations with each response so that users can learn more or verify information; and the use of trending searches and autocomplete suggestions to make search faster.
Mastercard’s Agentic Payments Program applies tokenization to integrate trusted, seamless payments experiences into the tailored recommendations and insights already provided on conversational AI platforms
Mastercard announced the launch of its Agentic Payments Program, Mastercard Agent Pay. The groundbreaking solution integrates with agentic AI to revolutionize commerce. Mastercard Agent Pay will deliver smarter, more secure, and more personal payments experiences to consumers, merchants, and issuers. The program introduces Mastercard Agentic Tokens, which build upon proven tokenization capabilities that today power global commerce solutions like mobile contactless payments, secure card-on-file, and Mastercard Payment Passkeys, as well as programmable payments like recurring expenses and subscriptions. This helps unlock an agentic commerce future where consumers and businesses can transact with trust, security, and control. Mastercard will collaborate with Microsoft on new use cases to scale agentic commerce, with other leading AI platforms to follow. Mastercard will also partner with technology enablers like IBM, with its watsonx Orchestrate product, to accelerate B2B use cases. In addition, Mastercard will work with acquirers and checkout players like Braintree and Checkout.com to enhance the tokenization capabilities they are already using today with merchants to deliver safe, transparent agentic payments. For banks, tokenized payment credentials will be seamlessly integrated across agentic commerce platforms, keeping card issuers at the forefront of this rapidly evolving technology with enhanced visibility, security, and control. Mastercard Agent Pay will enhance generative AI conversations for people and businesses alike by integrating trusted, seamless payments experiences into the tailored recommendations and insights already provided on conversational platforms. By identifying and validating a customer using Mastercard’s tokenization technology, a retailer will be able to offer a meaningful and consistent shopping experience, layering on relevant and personalized benefits, such as recommended products, free delivery, rewards, and discounts. Mastercard will work with Microsoft to integrate Microsoft’s leading AI technologies, including Microsoft Azure OpenAI Service and Microsoft Copilot Studio, with Mastercard’s trusted payment solutions to develop and scale agentic commerce, addressing the evolving needs of the entire commerce value chain.
FTC order requires Workado to offer competent and reliable evidence to support the 98% accuracy and efficacy claims of its AI content detection product
The Federal Trade Commission issued a proposed order requiring Workado, LLC to stop advertising the accuracy of its AI detection products unless it maintains competent and reliable evidence showing those products are as accurate as claimed. The settlement will be subject to public comment before becoming final. The order settles allegations that Workado promoted its AI Content Detector as “98 percent” accurate in detecting whether text was written by AI or human. But independent testing showed the accuracy rate on general-purpose content was just 53 percent, according to the FTC’s administrative complaint. The FTC alleges that Workado violated the FTC Act because the “98 percent” claim was false, misleading, or non-substantiated. The proposed order settling the complaint is designed to ensure Workado does not engage in similar false, misleading, or unsupported advertising in the future. Under the proposed order, Workado: 1) Is prohibited from making any representations about the effectiveness of any covered product unless it is not misleading, and the company has competent and reliable evidence to support the claim at the time it is made; 2) Is required to retain any evidence it uses to support such efficacy claims; 3) Must email eligible consumers about the consent order and settlement with the Commission; and 4) Must submit compliance reports to the FTC one year after the order is issued and every year for the following three years.
Microsoft’s most capable new Phi 4 AI model rivals the performance of far larger systems, yet small enough for low-latency environments
Microsoft launched several new “open” AI models, the most capable of which is competitive with OpenAI’s o3-mini on at least one benchmark. All of the new pemissively licensed models — Phi 4 mini reasoning, Phi 4 reasoning, and Phi 4 reasoning plus — are “reasoning” models, meaning they’re able to spend more time fact-checking solutions to complex problems. Phi 4 mini reasoning was trained on roughly 1 million synthetic math problems generated by Chinese AI startup DeepSeek’s R1 reasoning model. Around 3.8 billion parameters in size, Phi 4 mini reasoning is designed for educational applications, like “embedded tutoring” on lightweight devices. Parameters roughly correspond to a model’s problem-solving skills, and models with more parameters generally perform better than those with fewer parameters. Phi 4 reasoning, a 14-billion-parameter model, was trained using “high-quality” web data as well as “curated demonstrations” from OpenAI’s o3-mini. It’s best for math, science, and coding applications. As for Phi 4 reasoning plus, it’s Microsoft’s previously released Phi-4 model adapted into a reasoning model to achieve better accuracy on particular tasks. Phi 4 reasoning plus approaches the performance levels of R1, a model with significantly more parameters (671 billion). The company’s internal benchmarking also has Phi 4 reasoning plus matching o3-mini on OmniMath, a math skills test. “Using distillation, reinforcement learning, and high-quality data, these [new] models balance size and performance,” wrote Microsoft in a blog post. “They are small enough for low-latency environments yet maintain strong reasoning capabilities that rival much bigger models. This blend allows even resource-limited devices to perform complex reasoning tasks efficiently.”
UiPath launches the first enterprise-grade platform for agentic automation – controlled agency model ensures AI agents operate within clearly defined guardrails
UiPath launched its next-generation UiPath Platform™ for agentic automation, a groundbreaking platform designed to unify AI agents, robots, and people on a single intelligent system. The UiPath Platform for agentic automation is enabling everyone to begin building, deploying, and managing agents. Key Capabilities of the UiPath Platform for Agentic Automation: 1) UiPath Maestro™— It automates, models, and optimizes complex business processes end to end with built-in process intelligence and KPI monitoring to enable continuous optimization. 2) Through a controlled agency model, UiPath ensures AI agents operate within clearly defined guardrails—ensuring security, predictability, and performance. The platform features robust governance, real-time vulnerability assessments, and stringent data access controls to protect enterprise environments. “We are targeting 95%+ agent accuracy with every launch,” commented Raghu Malpani, Chief Technology Officer at UiPath. 3) Developers can rapidly prototype agents in UiPath Agent Builder within UiPath Studio, while having the opportunity to customize when needed. This means both technically oriented business professionals and experienced programmers can easily create sophisticated, scalable automations that can adapt to complex business requirements and evolving enterprise needs. 4) UiPath integrates with third-party agent frameworks including LangChain, Anthropic, and Microsoft. We partnered with Google Cloud on its new, open protocol called Agent2Agent (A2A), which will allow AI agents to communicate with each other, securely exchange information, and coordinate actions on top of various enterprise platforms or applications. This open approach breaks down silos and future-proofs enterprise automation strategies. 5) The new UiPath IXP (Intelligent Xtraction & Processing) solution introduces multi-modal, AI-based classification and extraction for unstructured data. Built for high-complexity use cases like claims adjudication, loan origination, and electronic batch records, IXP brings enterprise-grade scale to document processing. 6) UiPath has also introduced UI Agent for computer use, now in private preview—a natural language-driven agent that understands user intent, plans multi-step tasks, and executes actions across interfaces autonomously.
Snaplogic’s tool includes no-code visual prompt editor that enables any user to build, visualize, and refine intelligent agents for complex workflows in real time on a single screen
Snaplogic announced AgentCreator 3.0, a groundbreaking evolution in agentic AI technology that eliminates the complexity of enterprise AI adoption. The new release empowers organizations to build and scale their own AI solutions with no coding required. With AgentCreator 3.0, businesses are no longer constrained by human resource limitations. Instead, they gain access to a limitless workforce powered by AI-driven digital labor that works tirelessly, scales infinitely, and augments their best talent with PhD-level intelligence. Key additions to AgentCreator 3.0 include Prompt Composer and Agent Visualizer, making it easier than ever for customers to build, visualize, and refine intelligent agents for complex workflows. Prompt Composer is a visual prompt editor that simplifies prompt creation for faster iteration and stronger results, enabling anyone, from business users to engineers, to create, test, and refine AI instructions in real time on a single screen. This ensures high precision and adaptability as LLMs evolve. Agent Visualizer provides full transparency into AI decision-making, ensuring enterprises can trust, audit, and refine agent behavior. The visualization tool delivers a step-by-step breakdown of an agent’s decision-making process. SnapLogic is also introducing support for the Model Context Protocol (MCP) to further accelerate the adoption and deployment of Agentic AI. AgentCreator 3.0 empowers organizations with: AI-ready data, AI as digital labor, DIY AI without complexity, Security and governance, AI workforce collaboration.