• Menu
  • Skip to right header navigation
  • Skip to main content
  • Skip to primary sidebar

DigiBanker

Bringing you cutting-edge new technologies and disruptive financial innovations.

  • Home
  • Pricing
  • Features
    • Overview Of Features
    • Search
    • Favorites
  • Share!
  • Log In
  • Home
  • Pricing
  • Features
    • Overview Of Features
    • Search
    • Favorites
  • Share!
  • Log In

A multi-round AI coding challenge K Prize that tests models against flagged issues from GitHub to assess how well models can deal with real-world programming problems sees a top score of just 7.5% versus the industry benchmark of 75%

July 24, 2025 //  by Finnovate

Nonprofit Laude Institute announced the first winner of the K Prize, a multi-round AI coding challenge launched by Databricks and Perplexity co-founder Andy Konwinski. The winner will receive $50,000 for the prize. The final score set a new bar for AI-powered software engineers; with correct answers to just 7.5% of the questions on the test. K Prize runs offline with limited compute, so it favors smaller and open models. It levels the playing field. Konwinski has pledged $1 million to the first open source model that can score higher than 90% on the test. K Prize tests models against flagged issues from GitHub as a test of how well models can deal with real-world programming problems. But while SWE-Bench is based on a fixed set of problems that models can train against, the K Prize is designed as a “contamination-free version of SWE-Bench,” using a timed entry system to guard against any benchmark-specific training. For round one, models were due by March 12. The K Prize organizers then built the test using only GitHub issues flagged after that date. The 7.5% top score stands in marked contrast to SWE-Bench itself, which currently shows a 75% top score on its easier “Verified” test and 34% on its harder “Full” test. Konwinski still isn’t sure whether the disparity is due to contamination on SWE-Bench or just the challenge of collecting new issues from GitHub, but he expects the K Prize project to answer the question soon.

Read Article

 

Category: Additional Reading

Previous Post: « Embedded payments are seeing rising adoption in the parking sector through AI-recognition tech that lets customers just drive in and scan a QR code to enter their credit card information the first time they park, with automatic vehicle identification and charges applied on subsequent trips

Copyright © 2025 Finnovate Research · All Rights Reserved · Privacy Policy
Finnovate Research · Knyvett House · Watermans Business Park · The Causeway Staines · TW18 3BA · United Kingdom · About · Contact Us · Tel: +44-20-3070-0188

We use cookies to provide the best website experience for you. If you continue to use this site we will assume that you are happy with it.