Anthropic updated Claude with a feature called Integrations that will enable the chatbot to access data from third-party cloud services. The company rolled out the capability alongside an enhanced version of Research, a tool it introduced last month. The latter feature enables Claude to prepare detailed reports about user-specified topics. Research can now perform the task more thoroughly than before. The new Integrations capability will enable Claude to incorporate data from software-as-a-service applications into its prompt responses. If customers wish to connect Claude to an application for which a prepackaged integration isn’t available, they can build their own. Anthropic estimates that the process takes as little as 30 minutes. According to the company, developers can further speed up the workflow by using a set of tools that Cloudflare introduced in March to ease such projects. Claude’s new connectors are powered by MCP, a data transfer technology that Anthropic open-sourced. It provides software building blocks that reduce the amount of work involved in connecting a LLM to external applications. OpenAI, Anthropic’s top competitor, rolled out MCP support to its Agents SDK last month. Anthropic added MCP to Claude immediately after open-sourcing the technology last year. Until now, however, the chatbot only supported connections to applications installed on the user’s computer, which limited the feature’s usefulness.
Claude’s web search API to allow the AI assistant to conduct multiple progressive searches using earlier results to inform subsequent queries complete with source citations
Anthropic has introduced a web search capability for its Claude AI assistant, intensifying competition in the rapidly evolving AI search market where tech giants are racing to redefine how users find information online. The company announced that developers can now enable Claude to access current web information through its API, allowing the AI assistant to conduct multiple progressive searches to compile comprehensive answers complete with source citations. Anthropic’s technical approach represents a significant advance in how AI systems can be deployed as information gathering tools. The system employs a sophisticated decision-making layer that determines when external information would improve response quality, generating targeted search queries rather than simply passing user questions verbatim to a search backend. This “agentic” capability — allowing Claude to conduct multiple progressive searches using earlier results to inform subsequent queries — enables a more thorough research process than traditional search. The implementation essentially mimics how a human researcher might explore a topic, starting with general queries and progressively refining them based on initial findings. Anthropic’s web search API represents more than just another feature in the AI toolkit — it signals the evolution of internet information access toward a more integrated, conversation-based model. The new capability arrives amid signs that traditional search is losing ground to AI-powered alternatives. With Safari searches declining for the first time ever; we’re witnessing early indicators of a mass consumer behavior shift. Traditional search engines optimized for advertising revenue are increasingly being bypassed in favor of conversation-based interactions that prioritize information quality over commercial interests.
OpenAI releases GPT-4.1 models which are faster than GPT-4o and excel at coding and instruction following; but with a different set of safety evaluations
OpenAI is releasing its GPT-4.1 and GPT-4.1 mini AI models in ChatGPT. The GPT-4.1 models should help software engineers who are using ChatGPT to write or debug code, OpenAI spokesperson Shaokyi Amdo told TechCrunch. GPT-4.1 excels at coding and instruction following compared to GPT-4o, according to OpenAI, but is faster than its o-series of reasoning models. The company says it’s now rolling out GPT-4.1 to subscribers to ChatGPT Plus, Pro, and Team. Meanwhile, OpenAI is releasing GPT-4.1 mini for free and paying users of ChatGPT. As a result of this update, OpenAI is removing GPT-4.0 mini from ChatGPT for all users. “GPT-4.1 doesn’t introduce new modalities or ways of interacting with the model, and doesn’t surpass o3 in intelligence,” said OpenAI’s Head of Safety Systems Johannes Heidecke in a post. “This means that the safety considerations here, while substantial, are different from frontier models.” Now, OpenAI is releasing more information about GPT-4.1 and all its AI models. OpenAI has committed to publish the results of its internal AI model safety evaluations more frequently as part of an effort to increase transparency. Those results will live in OpenAI’s new Safety Evaluations Hub, which it launched on Wednesday.
LMArena is a neutral benchmarking platform that enables users to compare large language models through head-to-head matchups
LMArena, the company behind AI testing service Chatbot Arena, has raised $100 million in initialj funding, marking one of the largest seed rounds in the AI sector to date. LMArena operates as a neutral benchmarking platform that enables users to compare large language models through head-to-head matchups. It works by allowing users to submit prompts and evaluate anonymous responses from different models, selecting the best reply. The result is that the service offers a crowdsourced comparison method and unbiased rankings that reflect actual, real-world user preferences. By not favoring any specific company or model, the platform has attracted participation from nearly every major company and lab that is developing large language models, giving it industry-wide relevance and legitimacy. The company’s platform has become the main and arguably one of the best ways for both researchers and commercial AI developers to compare models. Major AI companies, including OpenAI, Google LLC and Anthropic PBC submit their models to LMArena to showcase performance and gather community feedback. LMArena’s ability to generate detailed performance comparisons without the need for direct integration into third-party systems makes it highly scalable.
Salesforce’s new benchmark for tackling ‘jagged intelligence’ in CRM scenarios shows leading agents succeed less than 65% of the time at function-calling for the use cases of three key personas: service agents, analysts, and managers
To tackle “jagged intelligence” one of AI’s most persistent challenges for business applications: the gap between an AI system’s raw intelligence and its ability to consistently perform in unpredictable enterprise environments —Salesforce revealed several new benchmarks, models, and frameworks designed to make future AI agents more intelligent, trusted, and versatile for enterprise use. The SIMPLE dataset, a public benchmark featuring 225 straightforward reasoning questions designed to measure how jagged an AI system’s capabilities really are. Perhaps the most significant innovation is CRMArena, a novel benchmarking framework designed to simulate realistic customer relationship management scenarios. It enables comprehensive testing of AI agents in professional contexts, addressing the gap between academic benchmarks and real-world business requirements. The framework evaluates agent performance across three key personas: service agents, analysts, and managers. Early testing revealed that even with guided prompting, leading agents succeed less than 65% of the time at function-calling for these personas’ use cases. Among the technical innovations announced, Salesforce highlighted SFR-Embedding, a new model for deeper contextual understanding that leads the Massive Text Embedding Benchmark (MTEB) across 56 datasets. A specialized version, SFR-Embedding-Code, was also introduced for developers, enabling high-quality code search and streamlining development. Salesforce also announced xLAM V2 (Large Action Model), a family of models specifically designed to predict actions rather than just generate text. These models start at just 1 billion parameters—a fraction of the size of many leading language models. To address enterprise concerns about AI safety and reliability, Salesforce introduced SFR-Guard, a family of models trained on both publicly available data and CRM-specialized internal data. These models strengthen the company’s Trust Layer, which provides guardrails for AI agent behavior. The company also launched ContextualJudgeBench, a novel benchmark for evaluating LLM-based judge models in context—testing over 2,000 challenging response pairs for accuracy, conciseness, faithfulness, and appropriate refusal to answer. Salesforce unveiled TACO, a multimodal action model family designed to tackle complex, multi-step problems through chains of thought-and-action (CoTA). This approach enables AI to interpret and respond to intricate queries involving multiple media types, with Salesforce claiming up to 20% improvement on the challenging MMVet benchmark.
Neo4j’s serverless solution enables users of all skill levels to access graph analytics without the need for custom queries, ETL pipelines, or specialized graph expertise and can be used seamlessly with any data source
Neo4j, has launched Neo4j Aura Graph Analytics, a new serverless offering that for the first time can be used seamlessly with any data source, and with Zero ETL (extract, load, transfer). The solution delivers the power of graph analytics to users of all skill levels, unlocking deeper intelligence and achieving 2X* greater insight precision and quality over traditional analytics. The new Neo4j offering makes graph analytics capabilities accessible to everyone and eliminates adoption barriers by removing the need for custom queries, ETL pipelines, or any need for specialized graph expertise – so that business decision-makers, data scientists, and other users can focus on outcomes, not overhead. Neo4j Aura Graph Analytics requires no infrastructure setup and no prior experience with graph technology or Cypher query language. Users seamlessly deploy and scale graph analytics workloads end-to-end, enabling them to collect, organize, analyze, and visualize data. The offering includes the industry’s largest selection of 65+ ready-to-use graph algorithms and is optimized for high-performance applications and parallel workflows. Users pay only for the processing power and storage they consume. Additional benefits and capabilities below are based on customer-reported outcomes that reflect real-world performance gains: 1) Up to 80% model accuracy, leading to 2X greater efficacy of insights that go beyond the limits of traditional analytics. 2) Insights achieved twice as fast as open-source alternatives with parallelized in-memory processing of graph algorithms 3) 75% less code, Zero ETL. 4) No administration overhead, and lower total cost of ownership.
Capgemini’s mainframe modernization offering automates legacy code analysis and extraction of business rules using a set of generative AI agents
Capgemini has launched a new offering that enables organizations to unlock greater value from their legacy systems at unprecedented speed and accuracy. The new approach, powered by generative and agentic AI, allows organizations to gain cost savings, agility, and a significant improvement in data quality. It converts legacy mainframe applications into modern, agile, and cloud-friendly formats that can run more efficiently either on or outside of a mainframe. Capgemini’s automated mainframe application refactoring uses tools and techniques to automatically convert legacy mainframe applications, such as those written in COBOL, into modern architecture. The approach is supported by rigorous automated testing for faster, higher-quality transformations and reduced risk for businesses. Capgemini’s experience in delivering large and complex mainframe modernization programs, market leadership in AI, deep domain knowledge, and broad understanding of complex industry regulations has already delivered tangible results for blue-chip clients.
Anthropic’s new AI models can use tools in parallel, extract and save key facts from local files, operate in two modes including near-instant responses and extended thinking and can maintain full context to sustain focus on longer projects
Anthropic has introduced the next generation of its artificial intelligence (AI) models, Claude Opus 4 and Claude Sonnet 4. “These models advance our customers’ AI strategies across the board: Opus 4 pushes boundaries in coding, research, writing and scientific discovery, while Sonnet 4 brings frontier performance to everyday use cases as an instant upgrade from Sonnet 3.7,” the company said. The company said Claude Opus 4 is its most powerful model yet and “the world’s best coding model,” adding that it delivers sustained performance on complex, long-running tasks and agent workflows. Claude Sonnet 4 balances performance and efficiency . It provides a significant upgrade to its predecessor, Claude Sonnet 3.7, and offers superior coding and reasoning while responding more precisely to user instructions. Both models can use web search and other tools during extended thinking, use tools in parallel, and extract and save key facts from local files, per the announcement. In addition, both models offer two modes, including near-instant responses and extended thinking. These models are a large step toward the virtual collaborator — maintaining full context, sustaining focus on longer projects, and driving transformational impact.
Postman’s agent framework enables developers to build AI agents by discovering the right APIs and LLMs, evaluating them across providers and testing them, and keeping them running reliably
In this exclusive episode of DEMO, Keith Shaw discusses the platform Postman, the world’s leading API collaboration platform. Postman is designed for developers and enterprises to build intelligent AI agents, simplifying the agent-building process, reducing platform sprawl, and unlocking the full potential of APIs and large language models. One key benefit of Postman is its suite to discover the right APIs and LLMs to use in agents, allowing users to test functionality, integrate, and build through the Flows experience all in one platform. Postman leverages internal APIs and connects to hundreds of thousands of public APIs, enabling agents to access tools like Slack, Notion, UPS, and more. The agent framework involves building agents, discovering APIs and models, evaluating and testing them, and keeping them running reliably. Postman’s core workspace includes a made-up company called ShelfWise, which stores all APIs used by the company. Postman supports multiple protocols like HTTP, GraphQL, and gRPC, and has introduced a new request type: LLMs. With the rise of AI, Postman offers options like OpenAI, Google, and Anthropic. Postman also allows users to evaluate multiple models across providers using a collection runner, which can be run manually or integrated into their CI/CD pipeline. It also provides visualization tools to help teams make smarter decisions. Postman AI Agent Builder is available on postman.com, where users can find collections, examples, and Flows to fork and use right away.
ChatGPT’s deep research tool gets a GitHub connector allowing developers to ask questions about a codebase and engineering documents
OpenAI announced what it’s calling the first “connector” for ChatGPT deep research, the company’s tool that searches across the web and other sources to compile thorough research reports on a topic. Now, ChatGPT deep research can link to GitHub (in beta), allowing developers to ask questions about a codebase and engineering documents. The connector will be available for ChatGPT Plus, Pro, and Team users over the next few days, with Enterprise and Edu support coming soon. The GitHub connector for ChatGPT deep research arrives as AI companies look to make their AI-powered chatbots more useful by building ways to link them to outside platforms and services. Anthropic, for example, recently debuted Integrations, which gives apps a pipeline into its AI chatbot Claude. In addition to answering questions about codebases, the new ChatGPT deep research GitHub connector lets ChatGPT users break down product specs into technical tasks and dependencies, summarize code structure and patterns, and understand how to implement new APIs using real code examples. The company also launched fine-tuning options for developers looking to customize its newer models for particular applications. Devs can now fine-tune OpenAI’s o4-mini “reasoning” model via a technique OpenAI calls reinforcement fine-tuning, which uses task-specific grading to improve the model’s performance. Fine-tuning has also rolled out for the company’s GPT-4.1 nano model.