• Menu
  • Skip to right header navigation
  • Skip to main content
  • Skip to primary sidebar

DigiBanker

Bringing you cutting-edge new technologies and disruptive financial innovations.

  • Home
  • Pricing
  • Features
    • Overview Of Features
    • Search
    • Favorites
  • Share!
  • Log In
  • Home
  • Pricing
  • Features
    • Overview Of Features
    • Search
    • Favorites
  • Share!
  • Log In

A multi-round AI coding challenge K Prize that tests models against flagged issues from GitHub to assess how well models can deal with real-world programming problems sees a top score of just 7.5% versus the industry benchmark of 75%

July 24, 2025 //  by Finnovate

Nonprofit Laude Institute announced the first winner of the K Prize, a multi-round AI coding challenge launched by Databricks and Perplexity co-founder Andy Konwinski. The winner will receive $50,000 for the prize. The final score set a new bar for AI-powered software engineers; with correct answers to just 7.5% of the questions on the test. K Prize runs offline with limited compute, so it favors smaller and open models. It levels the playing field. Konwinski has pledged $1 million to the first open source model that can score higher than 90% on the test. K Prize tests models against flagged issues from GitHub as a test of how well models can deal with real-world programming problems. But while SWE-Bench is based on a fixed set of problems that models can train against, the K Prize is designed as a “contamination-free version of SWE-Bench,” using a timed entry system to guard against any benchmark-specific training. For round one, models were due by March 12. The K Prize organizers then built the test using only GitHub issues flagged after that date. The 7.5% top score stands in marked contrast to SWE-Bench itself, which currently shows a 75% top score on its easier “Verified” test and 34% on its harder “Full” test. Konwinski still isn’t sure whether the disparity is due to contamination on SWE-Bench or just the challenge of collecting new issues from GitHub, but he expects the K Prize project to answer the question soon.

Read Article

 

Category: Additional Reading

Previous Post: « Agent2.AI’s AI orchestration platform can understand user intent, break down the request into smaller, manageable steps, delegate each task to focused atomic agents and deliver real, usable outputs such as reports, spreadsheets, and presentations
Next Post: JPMorgan Chase expects stablecoin adoption to grow slower than the widely projected US$2 trillion market as the ecosystem that supports stablecoins is far from developed and would take time to build out »

Copyright © 2025 Finnovate Research · All Rights Reserved · Privacy Policy
Finnovate Research · Knyvett House · Watermans Business Park · The Causeway Staines · TW18 3BA · United Kingdom · About · Contact Us · Tel: +44-20-3070-0188

We use cookies to provide the best website experience for you. If you continue to use this site we will assume that you are happy with it.