• Menu
  • Skip to right header navigation
  • Skip to main content
  • Skip to primary sidebar

DigiBanker

Bringing you cutting-edge new technologies and disruptive financial innovations.

  • Home
  • Pricing
  • Features
    • Overview Of Features
    • Search
    • Favorites
  • Share!
  • Log In
  • Home
  • Pricing
  • Features
    • Overview Of Features
    • Search
    • Favorites
  • Share!
  • Log In

Amazon’s new benchmark to evaluate AI coding agents’ ability to navigate and understand complex codebases and GitHub issues

April 25, 2025 //  by Finnovate

Amazon has introduced SWE-PolyBench, the first industry benchmark to evaluate AI coding agents’ ability to navigate and understand complex codebases. The benchmark, which measures system performance in GitHub issues, has spurred the development of capable coding agents and has become the de-facto standard for coding agent benchmarking. SWE-PolyBench contains over 2,000 curated issues in four languages and a stratified subset of 500 issues for rapid experimentation. The benchmark aims to advance AI performance in real-world scenarios. Key features of SWE-PolyBench at a glance: Multi-Language Support: Java (165 tasks), JavaScript (1017 tasks), TypeScript (729 tasks), and Python (199 tasks). Extensive Dataset: 2110 instances from 21 repositories ranging from web frameworks to code editors and ML tools, on the same scale as SWE-Bench full with more repository. Task Variety: Includes bug fixes, feature requests, and code refactoring. Faster Experimentation: SWE-PolyBench500 is a stratified subset for efficient experimentation. Leaderboard: A leaderboard with a rich set of metrics for transparent benchmarking.

Read Article

Category: Members, Essential Guidance

Previous Post: « Corporate treasuries are exploring yield-bearing strategies such as staking, lending and liquidity pools following the maturation of decentralized finance protocols and tokenized products
Next Post: OpenAI is planning a truly ‘open reasoning’ AI system with a ‘handoff’ feature that would enable it to make calls to the OpenAI API to access other, larger models for a substantial computational lift »

Copyright © 2025 Finnovate Research · All Rights Reserved · Privacy Policy
Finnovate Research · Knyvett House · Watermans Business Park · The Causeway Staines · TW18 3BA · United Kingdom · About · Contact Us · Tel: +44-20-3070-0188

We use cookies to provide the best website experience for you. If you continue to use this site we will assume that you are happy with it.