OutSystems announced the Early Access Program for OutSystems Agent Workbench which simplifies the transformation of existing business applications, workflows, and tools into intelligent, agentic systems that can reason, plan, and act. With Agent Workbench, organizations of any size can: Build and scale AI agents that work across the entire enterprise effortlessly and safely for real-time goal interpretation, option evaluation, and decision-making, while controlling tool sprawl through a single platform. Seamlessly integrate with custom AI models or leading third-party providers like Azure OpenAI and AWS Bedrock to centralize control over AI and data access, decrease cost, and enable multi-vendor utilization. Ground AI agents with a unified data fabric that connects to diverse enterprise data sources, such as existing OutSystems 11 data and actions, relational databases, data lakes, knowledge retrieval systems like Kendra and Azure AI Search, and even agent memory of past interactions, to ensure accurate and context-rich responses across workflows. Orchestrate multi-agent workflows where agents dynamically adjust process flows based on an understanding of all enterprise systems, with real-time context, reasoning, and decisions to tackle complex tasks—whether working in parallel, sequentially, or hierarchically. This enables collaborative task execution, escalation handling, and human intervention when necessary. Monitor agent performance enterprise-wide with real-time logging, error tracing, and built-in guardrails to ensure transparent, reliable decision-making. Gain full visibility into how AI agents operate at every step—making it easy to audit, troubleshoot behavior, and prevent hallucinations, while building trust through explainability and control.
Google study shows LLMs abandon correct answers under pressure, threatening multi-turn AI systems
A new study by researchers at Google DeepMind and University College London reveals how LLMs form, maintain and lose confidence in their answers. The findings reveal striking similarities between the cognitive biases of LLMs and humans, while also highlighting stark differences. The research reveals that LLMs can be overconfident in their own answers yet quickly lose that confidence and change their minds when presented with a counterargument, even if the counterargument is incorrect. Understanding the nuances of this behavior can have direct consequences on how you build LLM applications, especially conversational interfaces that span several turns. This study confirms that AI systems are not the purely logical agents they are often perceived to be. They exhibit their own set of biases, some resembling human cognitive errors and others unique to themselves, which can make their behavior unpredictable in human terms. For enterprise applications, this means that in an extended conversation between a human and an AI agent, the most recent information could have a disproportionate impact on the LLM’s reasoning (especially if it is contradictory to the model’s initial answer), potentially causing it to discard an initially correct answer. Fortunately, as the study also shows, we can manipulate an LLM’s memory to mitigate these unwanted biases in ways that are not possible with humans. Developers building multi-turn conversational agents can implement strategies to manage the AI’s context.
Liquid AI’s platform enables developers to integrate popular open-source small language models (SLMs) directly into mobile applications just a few lines of code to simplify edge AI deployments
Liquid AI, a startup founded by former MIT researchers, has released the Liquid Edge AI Platform (LEAP), a cross-platform software development kit (SDK) designed to make it easier for developers to integrate small language models (SLMs) directly into mobile applications. The SDK can be added to an iOS or Android project with just a few lines of code, and calling a local model is meant to feel as familiar as interacting with a traditional cloud API. LEAP is OS- and model-agnostic by design, supporting both iOS and Android, and offers compatibility with Liquid AI’s own liquid foundation models (LFMs) as well as many popular open-source small models. The platform aims to create a unified ecosystem for edge AI, offering tools for rapid iteration and deployment in real-world mobile environments. The company also released Apollo, a free iOS app that allows developers and users to interact with LEAP-compatible models in a local, offline setting. The LEAP SDK release builds on Liquid AI’s announcement of LFM2, its second-generation foundation model family designed specifically for on-device workloads. The platform is currently free to use under a developer license, with premium enterprise features available under a separate commercial license in the future.
Codien’s AI agent accelerates the migration from legacy test automation frameworks by generating clean, reliable automation testing scripts from plain English descriptions, understanding source tests, and converting and validating tests in real-time
Codien has launched its new AI agent, designed to simplify and accelerate the transition from legacy test automation frameworks like Protractor and Selenium to the modern Playwright framework, reducing migration time from days or weeks to minutes. It also offers intelligent test creation, helping users write new Playwright tests faster by generating clean, reliable Playwright automation testing scripts from plain English descriptions. Furthermore, Codien ensures accuracy by going beyond simple conversion; it understands your source tests, gradually converts them, and validates each new Playwright test in real-time to ensure correct functionality and that you can trust the results. The user experience is straightforward through an intuitive desktop application, available for macOS, Windows, and Linux. Users simply create a project, scan their source code to automatically discover all test cases, and then initiate the conversion. They can watch Codien convert and validate tests live, one by one, with a clean, intuitive dashboard keeping them updated on progress and status. Built with a local-first architecture, Codien ensures your test files and code remain on your device, keeping your data private and secure. Only minimal, relevant code snippets are securely sent to large language models via encrypted HTTPS, with no files uploaded, stored, or retained after processing. Codien operates on a flexible pay-as-you-go model with no subscriptions or vendor lock-in.
AWS’s new storage solution akin to S3 Bucket can cut the cost of uploading, storing, and querying vectors by up to 90% by eliminating the need for provisioning infrastructure for a vector database
AWS is introducing Amazon S3 Vectors, a specialized storage offering that can cut the cost of uploading, storing, and querying vectors by up to 90% compared to using a vector database. This move is likely to be of interest to those running generative AI or agentic AI applications in the cloud. Machine learning models typically represent data as vectors, which are stored in specialty vector databases or databases with vector capabilities for similarity search and retrieval at scale. AWS proposes that enterprises use a new type of S3 bucket, Amazon S3 Vector, which eliminates the need for provisioning infrastructure for a vector database. AWS has integrated S3 Vectors with Amazon Bedrock Knowledge Bases, Amazon SageMaker Unified Studio, and Amazon OpenSearch Service, ensuring efficient use of resources even as datasets grow and evolve. The OpenSearch integration provides flexibility for enterprises to store rarely accessed vectors to save costs. Developers can dynamically shift these vectors to OpenSearch for real-time, low-latency search when needed.
Tetrate’s solution allows developers to access various AI models with their own API keys and coordinates API calls across multiple LLMs, delegating the tasks assigned by the user to the most appropriate model based on their priorities
Service mesh company Tetrate announced the availability of the Tetrate Agent Router Service, a managed solution that makes it simpler for developers to direct AI queries and requests to AI agents to the most suitable model, based on their priorities, such as query and task complexity, inference costs and model performance or speciality. According to Tetrate, this kind of flexibility is exactly what developers need. The Agent Router Service acts like a centralized tool for controlling AI traffic. It allows them to work around the limitations of various large language models, avoid vendor lock-in and mitigate cost overruns. Tetrate AI Gateway is an open-source project that helps organizations integrate generative AI models and services into their applications. Through its unified API, developers can manage requests to and from multiple AI services and LLMs. With the Tetrate Agent Router Service, developers are getting even more control. It allows them to access various AI models with their own API keys, or use keys provided by Tetrate. It also provides features such as an interactive prompt playground for testing and refining AI agents and generative AI applications, automatic fallback to more reliable and affordable models, plus A/B testing tools for evaluating model performance. It will coordinate API calls across multiple LLMs, delegating the tasks assigned by the user to the most appropriate one. In the case of AI chatbots, the Tetrate Agent Router Service will route the conversation to the most responsive and/or cost-effective model, based on the developer’s priorities. This can help to reduce latency and manage high traffic more efficiently.
vFunction’s MCP server enables developers to query architectural issues real-time and use their preferred assistants to remediate issues using GenAI
vFunction, the pioneer of AI-driven architectural observability and modernization, is bringing its architectural context to any GenAI assistant, including native integrations with Amazon Q Developer and GitHub Copilot to guide developers through automated architectural modernization and GenAI-powered service transformation. vFunction’s GenAI is enriched with deep architectural knowledge that is aware of semantic structures like context, components, and logical domains, enabling code assistants to address system-wide architectural challenges with complete architectural awareness, rather than just isolated code modifications. By bringing architectural intelligence into developers’ workflows, vFunction accelerates application modernization, helping organizations move beyond lift-and-shift to fully maximize their cloud investments. “With these new advancements, teams can surface and resolve architectural debt, and transform their apps to cloud-native, with unprecedented speed through autonomous modernization,” said Amir Rapson, CTO and co-founder of vFunction. “From eliminating circular dependencies to refactoring ‘god classes’, developers can now simplify refactoring and modernization, accelerate delivery, and optimize architecture for the cloud.” One of the ways vFunction is addressing GenAI-based refactoring is with its new MCP server, connecting vFunction’s architectural observability engine with modern developer environments. It enables developers to query architectural issues, generate GenAI prompts, and kick off remediation—all from the command line. With optimized support for Amazon Q Developer and GitHub Copilot, developers can use their preferred assistants to resolve architectural issues using prompts enriched with real-time architectural data. This closes the divide between architects and developers, making the architectural vision executable within native workflows.
RegASK agentic AI architecture pairs domain-specific vertical LLM with specialized AI agents who perform distinct tasks, are coordinated by a ‘project manager’ agent and their outputs reviewed by an evaluator agent to deliver personalized insights for day-to-day compliance operations
RegASK, a provider of AI-driven regulatory intelligence for Consumer Goods and Life Sciences, has launched the industry’s first agentic AI architecture that pairs RegASK’s vertical large language model (V-LLM) with specialized AI agents to deliver personalized insights and streamline how teams find, understand, and act on regulatory information. These agents each perform a distinct task, such as document retrieval, translation, summarization, and assessment generation. These agents are coordinated by a dedicated ‘project manager’ agent that manages how tasks are assigned and performed across the system, enabling collaborative execution of multi-step workflows. An evaluator agent reviews outputs before they’re delivered to users, helping ensure accuracy and build trust in the results. Together, the enhanced agent network and embedded V-LLM power deeper automation, more tailored insights, and the ability to manage a wider range of day-to-day compliance operations. The launch also brings: A more powerful, embedded vertical language model: RegASK’s domain-specific LLM is now fully integrated into the platform and enhanced with additional structured attributes. The model gives agents deeper context to generate faster, more precise summaries, assessments, and search results, delivering insights that are directly aligned to users’ regulatory priorities. Redesigned user interface with streamlined regulatory change tracking: RegASK’s redesigned user experience significantly improves how regulatory teams identify and respond to critical updates. The new alerts module delivers customizable alert views, streamlined navigation, and faster access to essential regulatory details, enabling professionals to efficiently manage compliance workflows, mitigate risk proactively, and keep their organizations ahead in highly regulated environments.
Blok’s AI agents aim to eliminate friction points in software testing by simulating the behavior of human users and identifying their likes and dislikes using a combination of behavioral science and product data
A startup called Blok Intelligence Inc. has raised $7.5 million to transform the software testing process with AI agents that simulate the behavior of human users. Blok has developed AI agents that are grounded in a combination of behavioral science and product data to try to simulate how different types of people use software. That way, developers can identify the most useful features and uncover and eliminate any friction points in their applications. It’s aiming to transform the software testing process, which often takes weeks, and condense it into a matter of hours. According to the startup, the capabilities its AI agents provide are needed more than ever, given the surging popularity of “vibe coding,” which has led to a flood of new digital products, but the challenge is that many of these new applications aren’t giving people what they want. Blok gets around that with AI agents that behave like humans. It says they’re curious, imperfect and full of nuance, just like people are. By grounding them in the “messy realities” of human decision making, they’re better able to identify what humans will like and dislike about new software products. Co-founder and Chief Executive Tom Charman said he thinks static, one-size-fits-all software products are soon going to become obsolete, replaced by tools that are more adaptive and responsive to each user’s needs. But developers need help to understand what those needs are.
ZeroEntropy is a RAG based AI search tool strictly for developers that grabs data, even across messy internal documents and grabbing the most relevant information first
Startup ZeroEntropy joins a growing wave of infrastructure companies hoping to use retrieval-augmented generation (RAG) to power search for the next generation of AI agents. ZeroEntropy offers an API that manages ingestion, indexing, re-ranking, and evaluation. What that means is that — unlike a search product for enterprise employees like Glean — ZeroEntropy is strictly a developer tool. It quickly grabs data, even across messy internal documents. Houir Alami likens her startup to a “Supabase for search,” referring to the popular open source database that automates much of the database management. At its core is its proprietary re-ranker called ze-rank-1, which the company claims currently outperforms similar models from Cohere and Salesforce on both public and private retrieval benchmarks. It makes sure that when an AI system looks for answers in a knowledge base, it grabs the most relevant information first. “Right now, most teams are either stitching together existing tools from the market or dumping their entire knowledge base into an LLM’s context window. The first approach is time-consuming to build and maintain,” CEO Ghita Houir Alami said. “The second approach can cause compounding errors. We’re building a developer-first search infrastructure — think of it like a Supabase for search — designed to make deploying accurate, fast retrieval systems easy and efficient.”