A new study by researchers at Google DeepMind and University College London reveals how LLMs form, maintain and lose confidence in their answers. The findings reveal striking similarities between the cognitive biases of LLMs and humans, while also highlighting stark differences. The research reveals that LLMs can be overconfident in their own answers yet quickly lose that confidence and change their minds when presented with a counterargument, even if the counterargument is incorrect. Understanding the nuances of this behavior can have direct consequences on how you build LLM applications, especially conversational interfaces that span several turns. This study confirms that AI systems are not the purely logical agents they are often perceived to be. They exhibit their own set of biases, some resembling human cognitive errors and others unique to themselves, which can make their behavior unpredictable in human terms. For enterprise applications, this means that in an extended conversation between a human and an AI agent, the most recent information could have a disproportionate impact on the LLM’s reasoning (especially if it is contradictory to the model’s initial answer), potentially causing it to discard an initially correct answer. Fortunately, as the study also shows, we can manipulate an LLM’s memory to mitigate these unwanted biases in ways that are not possible with humans. Developers building multi-turn conversational agents can implement strategies to manage the AI’s context.
Liquid AI’s platform enables developers to integrate popular open-source small language models (SLMs) directly into mobile applications just a few lines of code to simplify edge AI deployments
Liquid AI, a startup founded by former MIT researchers, has released the Liquid Edge AI Platform (LEAP), a cross-platform software development kit (SDK) designed to make it easier for developers to integrate small language models (SLMs) directly into mobile applications. The SDK can be added to an iOS or Android project with just a few lines of code, and calling a local model is meant to feel as familiar as interacting with a traditional cloud API. LEAP is OS- and model-agnostic by design, supporting both iOS and Android, and offers compatibility with Liquid AI’s own liquid foundation models (LFMs) as well as many popular open-source small models. The platform aims to create a unified ecosystem for edge AI, offering tools for rapid iteration and deployment in real-world mobile environments. The company also released Apollo, a free iOS app that allows developers and users to interact with LEAP-compatible models in a local, offline setting. The LEAP SDK release builds on Liquid AI’s announcement of LFM2, its second-generation foundation model family designed specifically for on-device workloads. The platform is currently free to use under a developer license, with premium enterprise features available under a separate commercial license in the future.
Codien’s AI agent accelerates the migration from legacy test automation frameworks by generating clean, reliable automation testing scripts from plain English descriptions, understanding source tests, and converting and validating tests in real-time
Codien has launched its new AI agent, designed to simplify and accelerate the transition from legacy test automation frameworks like Protractor and Selenium to the modern Playwright framework, reducing migration time from days or weeks to minutes. It also offers intelligent test creation, helping users write new Playwright tests faster by generating clean, reliable Playwright automation testing scripts from plain English descriptions. Furthermore, Codien ensures accuracy by going beyond simple conversion; it understands your source tests, gradually converts them, and validates each new Playwright test in real-time to ensure correct functionality and that you can trust the results. The user experience is straightforward through an intuitive desktop application, available for macOS, Windows, and Linux. Users simply create a project, scan their source code to automatically discover all test cases, and then initiate the conversion. They can watch Codien convert and validate tests live, one by one, with a clean, intuitive dashboard keeping them updated on progress and status. Built with a local-first architecture, Codien ensures your test files and code remain on your device, keeping your data private and secure. Only minimal, relevant code snippets are securely sent to large language models via encrypted HTTPS, with no files uploaded, stored, or retained after processing. Codien operates on a flexible pay-as-you-go model with no subscriptions or vendor lock-in.
AWS’s new storage solution akin to S3 Bucket can cut the cost of uploading, storing, and querying vectors by up to 90% by eliminating the need for provisioning infrastructure for a vector database
AWS is introducing Amazon S3 Vectors, a specialized storage offering that can cut the cost of uploading, storing, and querying vectors by up to 90% compared to using a vector database. This move is likely to be of interest to those running generative AI or agentic AI applications in the cloud. Machine learning models typically represent data as vectors, which are stored in specialty vector databases or databases with vector capabilities for similarity search and retrieval at scale. AWS proposes that enterprises use a new type of S3 bucket, Amazon S3 Vector, which eliminates the need for provisioning infrastructure for a vector database. AWS has integrated S3 Vectors with Amazon Bedrock Knowledge Bases, Amazon SageMaker Unified Studio, and Amazon OpenSearch Service, ensuring efficient use of resources even as datasets grow and evolve. The OpenSearch integration provides flexibility for enterprises to store rarely accessed vectors to save costs. Developers can dynamically shift these vectors to OpenSearch for real-time, low-latency search when needed.
Tetrate’s solution allows developers to access various AI models with their own API keys and coordinates API calls across multiple LLMs, delegating the tasks assigned by the user to the most appropriate model based on their priorities
Service mesh company Tetrate announced the availability of the Tetrate Agent Router Service, a managed solution that makes it simpler for developers to direct AI queries and requests to AI agents to the most suitable model, based on their priorities, such as query and task complexity, inference costs and model performance or speciality. According to Tetrate, this kind of flexibility is exactly what developers need. The Agent Router Service acts like a centralized tool for controlling AI traffic. It allows them to work around the limitations of various large language models, avoid vendor lock-in and mitigate cost overruns. Tetrate AI Gateway is an open-source project that helps organizations integrate generative AI models and services into their applications. Through its unified API, developers can manage requests to and from multiple AI services and LLMs. With the Tetrate Agent Router Service, developers are getting even more control. It allows them to access various AI models with their own API keys, or use keys provided by Tetrate. It also provides features such as an interactive prompt playground for testing and refining AI agents and generative AI applications, automatic fallback to more reliable and affordable models, plus A/B testing tools for evaluating model performance. It will coordinate API calls across multiple LLMs, delegating the tasks assigned by the user to the most appropriate one. In the case of AI chatbots, the Tetrate Agent Router Service will route the conversation to the most responsive and/or cost-effective model, based on the developer’s priorities. This can help to reduce latency and manage high traffic more efficiently.
vFunction’s MCP server enables developers to query architectural issues real-time and use their preferred assistants to remediate issues using GenAI
vFunction, the pioneer of AI-driven architectural observability and modernization, is bringing its architectural context to any GenAI assistant, including native integrations with Amazon Q Developer and GitHub Copilot to guide developers through automated architectural modernization and GenAI-powered service transformation. vFunction’s GenAI is enriched with deep architectural knowledge that is aware of semantic structures like context, components, and logical domains, enabling code assistants to address system-wide architectural challenges with complete architectural awareness, rather than just isolated code modifications. By bringing architectural intelligence into developers’ workflows, vFunction accelerates application modernization, helping organizations move beyond lift-and-shift to fully maximize their cloud investments. “With these new advancements, teams can surface and resolve architectural debt, and transform their apps to cloud-native, with unprecedented speed through autonomous modernization,” said Amir Rapson, CTO and co-founder of vFunction. “From eliminating circular dependencies to refactoring ‘god classes’, developers can now simplify refactoring and modernization, accelerate delivery, and optimize architecture for the cloud.” One of the ways vFunction is addressing GenAI-based refactoring is with its new MCP server, connecting vFunction’s architectural observability engine with modern developer environments. It enables developers to query architectural issues, generate GenAI prompts, and kick off remediation—all from the command line. With optimized support for Amazon Q Developer and GitHub Copilot, developers can use their preferred assistants to resolve architectural issues using prompts enriched with real-time architectural data. This closes the divide between architects and developers, making the architectural vision executable within native workflows.
RegASK agentic AI architecture pairs domain-specific vertical LLM with specialized AI agents who perform distinct tasks, are coordinated by a ‘project manager’ agent and their outputs reviewed by an evaluator agent to deliver personalized insights for day-to-day compliance operations
RegASK, a provider of AI-driven regulatory intelligence for Consumer Goods and Life Sciences, has launched the industry’s first agentic AI architecture that pairs RegASK’s vertical large language model (V-LLM) with specialized AI agents to deliver personalized insights and streamline how teams find, understand, and act on regulatory information. These agents each perform a distinct task, such as document retrieval, translation, summarization, and assessment generation. These agents are coordinated by a dedicated ‘project manager’ agent that manages how tasks are assigned and performed across the system, enabling collaborative execution of multi-step workflows. An evaluator agent reviews outputs before they’re delivered to users, helping ensure accuracy and build trust in the results. Together, the enhanced agent network and embedded V-LLM power deeper automation, more tailored insights, and the ability to manage a wider range of day-to-day compliance operations. The launch also brings: A more powerful, embedded vertical language model: RegASK’s domain-specific LLM is now fully integrated into the platform and enhanced with additional structured attributes. The model gives agents deeper context to generate faster, more precise summaries, assessments, and search results, delivering insights that are directly aligned to users’ regulatory priorities. Redesigned user interface with streamlined regulatory change tracking: RegASK’s redesigned user experience significantly improves how regulatory teams identify and respond to critical updates. The new alerts module delivers customizable alert views, streamlined navigation, and faster access to essential regulatory details, enabling professionals to efficiently manage compliance workflows, mitigate risk proactively, and keep their organizations ahead in highly regulated environments.
Blok’s AI agents aim to eliminate friction points in software testing by simulating the behavior of human users and identifying their likes and dislikes using a combination of behavioral science and product data
A startup called Blok Intelligence Inc. has raised $7.5 million to transform the software testing process with AI agents that simulate the behavior of human users. Blok has developed AI agents that are grounded in a combination of behavioral science and product data to try to simulate how different types of people use software. That way, developers can identify the most useful features and uncover and eliminate any friction points in their applications. It’s aiming to transform the software testing process, which often takes weeks, and condense it into a matter of hours. According to the startup, the capabilities its AI agents provide are needed more than ever, given the surging popularity of “vibe coding,” which has led to a flood of new digital products, but the challenge is that many of these new applications aren’t giving people what they want. Blok gets around that with AI agents that behave like humans. It says they’re curious, imperfect and full of nuance, just like people are. By grounding them in the “messy realities” of human decision making, they’re better able to identify what humans will like and dislike about new software products. Co-founder and Chief Executive Tom Charman said he thinks static, one-size-fits-all software products are soon going to become obsolete, replaced by tools that are more adaptive and responsive to each user’s needs. But developers need help to understand what those needs are.
ZeroEntropy is a RAG based AI search tool strictly for developers that grabs data, even across messy internal documents and grabbing the most relevant information first
Startup ZeroEntropy joins a growing wave of infrastructure companies hoping to use retrieval-augmented generation (RAG) to power search for the next generation of AI agents. ZeroEntropy offers an API that manages ingestion, indexing, re-ranking, and evaluation. What that means is that — unlike a search product for enterprise employees like Glean — ZeroEntropy is strictly a developer tool. It quickly grabs data, even across messy internal documents. Houir Alami likens her startup to a “Supabase for search,” referring to the popular open source database that automates much of the database management. At its core is its proprietary re-ranker called ze-rank-1, which the company claims currently outperforms similar models from Cohere and Salesforce on both public and private retrieval benchmarks. It makes sure that when an AI system looks for answers in a knowledge base, it grabs the most relevant information first. “Right now, most teams are either stitching together existing tools from the market or dumping their entire knowledge base into an LLM’s context window. The first approach is time-consuming to build and maintain,” CEO Ghita Houir Alami said. “The second approach can cause compounding errors. We’re building a developer-first search infrastructure — think of it like a Supabase for search — designed to make deploying accurate, fast retrieval systems easy and efficient.”
DigitalOcean’s managed AI platform offers one simple UI with integrations for storage, functions, and database to build AI agents that can reduce costs or streamline user experiences—without requiring deep AI expertise on their team
DigitalOcean Holdings, announced the general availability of its DigitalOcean GradientAI™ Platform, a managed AI platform that enables developers to combine their data with foundation models from Anthropic, Meta, Mistral and OpenAI to add customized GenAI agents to their applications. The DigitalOcean GradientAI Platform is a fully managed service where customers do not need to manage infrastructure and can deploy Generative AI capabilities in minutes to their applications. With the DigitalOcean GradientAI Platform, all tools and data are available through one simple UI with integrations for storage, functions, and database all powered by DigitalOcean’s GPU cloud. This empowers customers to build AI agents that can reduce costs or streamline user experiences—without requiring deep AI expertise on their team. The DigitalOcean GradientAI Platform is built with simplicity in mind to get GenAI-backed experiences into customer applications quickly. By leveraging retrieval augmented generation (RAG), customers can quickly and easily create GenAI agents for use within their applications. These agents offer powerful capabilities that can be enhanced through function routing to integrate with third-party APIs, and agent routing to connect with other GenAI Agents within the platform. Additionally, with Serverless LLM Inference, customers can integrate models from multiple providers via one API, with usage-based billing and no infrastructure to manage.