AI startup Reflection AI has developed an autonomous agent known as Asimov. It has been trained to understand how software is created by ingesting not only code, but the entirety of a business’ data to try to piece together why an application or system does what it does. Co-founder and Chief Executive Misha Laskin said that Asimov reads everything from emails to slack messages, project notes to documentation, in addition to the code, to learn everything about how and why the app was created. He explained that he believes this is the simplest and most natural way for AI agents to become masters at coding. Asimov is actually a collection of multiple smaller AI agents that are deployed inside customer’s cloud environments so that the data remains within their control. Asimov’s agents then cooperate with one another to try and understand the underlying code of whatever piece of software they’ve been assigned to, so they can answer any questions that human users might have about it. There are several smaller agents designed to retrieve the necessary data, and they work with a larger “reasoning” agent that collects all of their findings and tries to generate coherent answers to user’s questions.
Anthropic’s analytics dashboard for Claude coding agent to provide detailed breakdowns of activity by user and cost including lines of code accepted, suggestion accept rates and total spend over time
Anthropic is rolling out a comprehensive analytics dashboard for its Claude Code AI programming assistant. The new dashboard will provide engineering managers with detailed metrics on how their teams use Claude Code, including lines of code accepted, suggestion accept rates, total user activity over time, total spend over time, average daily spend for each user, and average daily lines of code accepted for each user. The dashboard will track commits, pull requests, and provide detailed breakdowns of activity by user and cost — data that engineering leaders say is crucial for understanding how AI is changing development workflows. The feature includes role-based access controls, allowing organizations to configure who can view usage data. The system focuses on metadata rather than actual code content, addressing potential privacy concerns about employee surveillance. The platform has seen active user base growth of 300% and run-rate revenue expansion of more than 5.5 times, according to company data. Unlike some competitors that focus primarily on code completion, Claude Code offers what Anthropic calls “agentic” capabilities — the ability to understand entire codebases, make coordinated changes across multiple files, and work directly within existing development workflows.
Confident Security offers an end-to-end encryption tool that wraps around foundational models, guaranteeing that prompts and metadata can’t be stored, seen, or used for AI training
Startup Confident Security aims to be “the Signal for AI.” The company’s product, CONFSEC, is an end-to-end encryption tool that wraps around foundational models, guaranteeing that prompts and metadata can’t be stored, seen, or used for AI training, even by the model provider or any third party. The company wants to serve as an intermediary vendor between AI vendors and their customers — like hyperscalers, governments, and enterprises. CONFSEC is modeled after Apple’s Private Cloud Compute (PCC) architecture, which “is 10x better than anything out there in terms of guaranteeing that Apple cannot see your data” when it runs certain AI tasks securely in the cloud. Like Apple’s PCC, Confident Security’s system works by first anonymizing data by encrypting and routing it through services like Cloudflare or Fastly, so servers never see the original source or content. Next, it uses advanced encryption that only allows decryption under strict conditions. Finally, the software running the AI inference is publicly logged and open to review so that experts can verify its guarantees. CONFSEC is also well-suited for new AI browsers hitting the market, like Perplexity’s Comet, to give customers guarantees that their sensitive data isn’t being stored on a server somewhere that the company or bad actors could access, or that their work-related prompts aren’t being used to “train AI to do your job.”
Analog Devices AI tool automates the end-to-end machine learning pipeline for edge AI, including model search and optimization using state-of-the-art algorithms and verifies model size against the device’s RAM to enable successful deployment
Analog Devices Inc. (ADI) has introduced AutoML for Embedded, an AI tool that automates the end-to-end machine learning pipeline for edge AI. The tool, co-developed with Antmicro, is now available as part of the Kenning framework, integrated into CodeFusion Studio. The Kenning framework is a hardware-agnostic and open-source platform for optimizing, benchmarking, and deploying AI models on edge devices. AutoML for Embedded allows developers without data science expertise to build high-quality and efficient models that deliver robust performance. The tool automates model search and optimization using state-of-the-art algorithms, leveraging SMAC to explore model architectures and training parameters efficiently. It also verifies model size against the device’s RAM to enable successful deployment. Candidate models can be optimized, evaluated, and benchmarked using Kenning’s standard flows, with detailed reports on size, speed, and accuracy to guide deployment decisions. Antmicro’s Michael Gielda, VP Business Development, said that AutoML in Kenning reduces the complexity of building optimized edge AI models, allowing customers to take full control of their products. AutoML for Embedded is a Visual Studio Code plugin built on the Kenning library that supports: ADI MAX78002 AI accelerator MCUs and MAX32690 devices — deploy models directly to industry-leading edge AI hardware. Simulation and RTOS workflows — leverage Renode-based simulation and Zephyr RTOS for rapid prototyping and testing. General-purpose, open-source tools — allowing flexible model optimisation without platform lock-in
ChatGPT’s new ‘router’ function to automatically select the best OpenAI model to respond to the user’s input on the fly, depending on the specific input’s content by switching between reasoning, non-reasoning, and tool-using models
Reports emerged over the last few days on X from AI influencers, including OpenAI’s own researcher “Roon (@tszzl on X)” (speculated to be technical team member Tarun Gogineni) — of a new “router” function that will automatically select the best OpenAI model to respond to the user’s input on the fly, depending on the specific input’s content. Similarly, Yuchen Jin, Co-founder & CTO of AI inference cloud provider Hyperbolic Labs, wrote in an X post, “Heard GPT-5 is imminent, from a little bird. It’s not one model, but multiple models. It has a router that switches between reasoning, non-reasoning, and tool-using models. That’s why Sam said they’d “fix model naming”: prompts will just auto-route to the right model. GPT-6 is in training.” While a presumably far more advanced GPT-5 model would (and will) be huge news if and when released, the router may make life much easier and more intelligent for the average ChatGPT subscriber. It would also follow on the heels of other third-party products such as the web-based Token Monster chatbot, which automatically select and combine responses from multiple third-party LLMs to respond to user queries. Hopefully any hypothetical OpenAI router seamlessly helps direct them to the right model product for their needs, when they need it.
Sovos’s AI platform delivers automation through every stage of compliance for e-invoicing, taxation and regulatory reporting letting users navigate complexity through natural language, visual interfaces, self-service analytics and biometric security
Sovos, announced the launch of Sovi™ AI, a first-of-its-kind suite of embedded AI and machine learning capabilities purpose-built for tax compliance. Sovi symbolizes smart power in action, a perfect reflection of Sovos’ embedded AI engine that drives a whole panorama of intelligent automation across the Sovos Tax Compliance Cloud platform. Sovi delivers unprecedented insight, automation, and reliability throughout every stage of compliance for e-invoicing, taxation and regulatory reporting. Sovi AI will integrate across analytics, automation, and regulatory workflows, enabling technical and non-technical teams to navigate complexity through natural language, visual interfaces, and intuitive guidance. Sovi AI capabilities are already operational across Sovos solutions, including advanced biometrics for face and liveness detection, image recognition, and secure authentication built into Sovos Trust solutions. The roadmap includes ambitious expansions such as AI compliance checks, Ask Sovi for embedded assistants, automated mapping tools for goods and services classification, and intelligent document agents for AP process automation. Sovi AI enables organizations to achieve: Enhanced Efficiency: Self-service analytics eliminate IT dependencies for finance and tax teams; Improved Accuracy: Biometric security and AI validations reduce errors, fraud, and compliance mismatches; Greater Clarity: Conversational AI and insightful dashboards uncover hidden issues and opportunities; Unlimited Scalability: Future-proof compliance capabilities regardless of country, volume, or complexity.
Agent2.AI’s AI orchestration platform can understand user intent, break down the request into smaller, manageable steps, delegate each task to focused atomic agents and deliver real, usable outputs such as reports, spreadsheets, and presentations
Agent2.AI announced the upcoming launch of Super Agent, a breakthrough AI orchestration platform designed to coordinate intelligent work across multiple agents, APIs, and even real human collaborators. Unlike traditional AI tools that focus on generating content or answering questions, Super Agent acts as an orchestration layer — a system that understands user intent, delegates work to the right components, and delivers real, usable outputs such as reports, spreadsheets, and presentations. “We’re not building just another AI agent,” said Chuci Qin, CEO of Agent2.AI. Users can prompt Super Agent with requests and system will automatically break each request into smaller, manageable steps. Each task is broken down and handled by focused atomic agents. Each agent is built to do one specific job, such as finding information, organizing research, or creating slides. These atomic agents form a growing ecosystem inside Agent2.AI, each focused, reliable, and composable. Super Agent can also call on external tools and agents through standard protocols such as MCP or A2A, allowing the system to dynamically connect with open-source frameworks, third-party APIs, or no-code automations as needed. In some cases, tasks may require not just software, but real-world execution, such as placing an order, contacting a vendor, or managing a physical deliverable. When that’s the case, Super Agent can seamlessly coordinate with vetted freelancers or agency partners. These human contributors are not fallback options, but core participants in a flexible, multi-agent system.
A new open-source method utilizes the MCP architecture to evaluate agent performance through a variety of available LLMs by gathering real-time information on how agents interact with tools, generating synthetic data and creating a database to benchmark them
Researchers from Salesforce discovered another way to utilize MCP technology, this time to aid in evaluating AI agents themselves. The researchers unveiled MCPEval, a new method and open-source toolkit built on the architecture of the MCP system that tests agent performance when using tools. They noted current evaluation methods for agents are limited in that these “often relied on static, pre-defined tasks, thus failing to capture the interactive real-world agentic workflows.” MCPEval differentiates itself by being a fully automated process, which the researchers claimed allows for rapid evaluation of new MCP tools and servers. It both gathers information on how agents interact with tools within an MCP server, generates synthetic data and creates a database to benchmark agents. Users can choose which MCP servers and tools within those servers to test the agent’s performance on. MCPEval’s framework takes on a task generation, verification and model evaluation design. Leveraging multiple large language models (LLMs) so users can choose to work with models they are more familiar with, agents can be evaluated through a variety of available LLMs in the market. Enterprises can access MCPEval through an open-source toolkit released by Salesforce. Through a dashboard, users configure the server by selecting a model, which then automatically generates tasks for the agent to follow within the chosen MCP server. Once the user verifies the tasks, MCPEval then takes the tasks and determines the tool calls needed as ground truth. These tasks will be used as the basis for the test. Users choose which model they prefer to run the evaluation. MCPEval can generate a report on how well the agent and the test model functioned in accessing and using these tools. What makes MCPEval stand out from other agent evaluators is that it brings the testing to the same environment in which the agent will be working. Agents are evaluated on how well they access tools within the MCP server to which they will likely be deployed.
Alibaba’s Qwen3-Coder launches and it ‘might be the best coding model yet’- designed to handle complex, multi-step coding workflows and can create full-fledged, functional applications in seconds or minutes
Chinese e-commerce giant Alibaba’s “Qwen Team” has come out with Qwen3-Coder-480B-A35B-Instruct, a new open-source LLM focused on assisting with software development. It is designed to handle complex, multi-step coding workflows and can create full-fledged, functional applications in seconds or minutes. Qwen3-Coder, is available now under an open source Apache 2.0 license, meaning it’s free for any enterprise to take without charge, download, modify, deploy and use in their commercial applications for employees or end customers. It’s also so highly performant on third-party benchmarks and anecdotal usage among AI power users for “vibe coding.” Qwen3-Coder is a Mixture-of-Experts (MoE) model with 480 billion total parameters, 35 billion active per query, and 8 active experts out of 160. It supports 256K token context lengths natively, with extrapolation up to 1 million tokens using YaRN (Yet another RoPE extrapolatioN — a technique used to extend a language model’s context length beyond its original training limit by modifying the Rotary Positional Embeddings (RoPE) used during attention computation. This capacity enables the model to understand and manipulate entire repositories or lengthy documents in a single pass. Designed as a causal language model, it features 62 layers, 96 attention heads for queries, and 8 for key-value pairs. It is optimized for token-efficient, instruction-following tasks and omits support for <think> blocks by default, streamlining its outputs. Qwen3-Coder has achieved leading performance among open models on several agentic evaluation suites: SWE-bench Verified: 67.0% (standard), 69.6% (500-turn); GPT-4.1: 54.6%; Gemini 2.5 Pro Preview: 49.0%; Claude Sonnet-4: 70.4%. The model also scores competitively across tasks such as agentic browser use, multi-language programming, and tool use. For enterprises, Qwen3-Coder offers an open, highly capable alternative to closed-source proprietary models. With strong results in coding execution and long-context reasoning, it is especially relevant for: Codebase-level understanding: Ideal for AI systems that must comprehend large repositories, technical documentation, or architectural patterns Automated pull request workflows: Its ability to plan and adapt across turns makes it suitable for auto-generating or reviewing pull requests Tool integration and orchestration: Through its native tool-calling APIs and function interface, the model can be embedded in internal tooling and CI/CD systems. This makes it especially viable for agentic workflows and products, i.e., those where the user triggers one or multiple tasks that it wants the AI model to go off and do autonomously, on its own, checking in only when finished or when questions arise. Data residency and cost control: As an open model, enterprises can deploy Qwen3-Coder on their own infrastructure—whether cloud-native or on-prem—avoiding vendor lock-in and managing compute usage more directly.
Hailo Technologies AI accelerator device runs hybrid AI pipelines that blend LLMs, vision-language models (VLMs), and other multi-modal AI with traditional convolutional neural networks (CNNs) directly on-device, eliminating the need for cloud-based inference
Israeli chipmaker Hailo Technologies has released the Hailo-10H, the first discrete AI accelerator designed for generative AI workloads at the edge. The device runs large language models (LLMs), vision-language models (VLMs), and other multi-modal AI directly on-device, eliminating the need for cloud-based inference. The Hailo-10H offers unmatched power efficiency and low latency, achieving first-token generation in under one second and maintaining 10 tokens per second on 2-billion parameter LLMs. It can also generate images with Stable Diffusion 2.1 in under five seconds, demonstrating a significant leap forward for offline generative workloads. The chip is designed around Hailo’s second-generation neural core architecture, providing 40 tera-operations per second (TOPS) of INT4 performance and 20 TOPS of INT8 at a typical power draw of 2.5 W. It is fully compatible with TensorFlow, PyTorch, ONNX, and Keras, and is supported by Hailo’s mature software stack. The device is designed to work in hybrid AI pipelines that blend LLMs or VLMs with traditional convolutional neural networks (CNNs), conserving power and ensuring real-time responsiveness for mission-critical applications like video analytics.