• Menu
  • Skip to right header navigation
  • Skip to main content
  • Skip to primary sidebar

DigiBanker

Bringing you cutting-edge new technologies and disruptive financial innovations.

  • Home
  • Pricing
  • Features
    • Overview Of Features
    • Search
    • Favorites
  • Share!
  • Log In
  • Home
  • Pricing
  • Features
    • Overview Of Features
    • Search
    • Favorites
  • Share!
  • Log In

A new open-source method utilizes the MCP architecture to evaluate agent performance through a variety of available LLMs by gathering real-time information on how agents interact with tools, generating synthetic data and creating a database to benchmark them

July 24, 2025 //  by Finnovate

Researchers from Salesforce discovered another way to utilize MCP technology, this time to aid in evaluating AI agents themselves. The researchers unveiled MCPEval, a new method and open-source toolkit built on the architecture of the MCP system that tests agent performance when using tools. They noted current evaluation methods for agents are limited in that these “often relied on static, pre-defined tasks, thus failing to capture the interactive real-world agentic workflows.” MCPEval differentiates itself by being a fully automated process, which the researchers claimed allows for rapid evaluation of new MCP tools and servers. It both gathers information on how agents interact with tools within an MCP server, generates synthetic data and creates a database to benchmark agents. Users can choose which MCP servers and tools within those servers to test the agent’s performance on.  MCPEval’s framework takes on a task generation, verification and model evaluation design. Leveraging multiple large language models (LLMs) so users can choose to work with models they are more familiar with, agents can be evaluated through a variety of available LLMs in the market.  Enterprises can access MCPEval through an open-source toolkit released by Salesforce. Through a dashboard, users configure the server by selecting a model, which then automatically generates tasks for the agent to follow within the chosen MCP server.  Once the user verifies the tasks, MCPEval then takes the tasks and determines the tool calls needed as ground truth. These tasks will be used as the basis for the test. Users choose which model they prefer to run the evaluation. MCPEval can generate a report on how well the agent and the test model functioned in accessing and using these tools. What makes MCPEval stand out from other agent evaluators is that it brings the testing to the same environment in which the agent will be working. Agents are evaluated on how well they access tools within the MCP server to which they will likely be deployed.

Read Article

Category: AI & Machine Economy, Innovation Topics

Previous Post: « Embedded payments are seeing rising adoption in the parking sector through AI-recognition tech that lets customers just drive in and scan a QR code to enter their credit card information the first time they park, with automatic vehicle identification and charges applied on subsequent trips

Copyright © 2025 Finnovate Research · All Rights Reserved · Privacy Policy
Finnovate Research · Knyvett House · Watermans Business Park · The Causeway Staines · TW18 3BA · United Kingdom · About · Contact Us · Tel: +44-20-3070-0188

We use cookies to provide the best website experience for you. If you continue to use this site we will assume that you are happy with it.