A multi-round AI coding challenge K Prize that tests models against flagged issues from GitHub to assess how well models can deal with real-world programming problems sees a top score of just 7.5% versus the industry benchmark of 75% • DigiBanker

Nonprofit Laude Institute announced the first winner of the K Prize, a multi-round AI coding challenge launched by Databricks and Perplexity co-founder Andy Konwinski. The winner will receive $50,000 for the prize. The final score set a new bar for AI-powered software engineers; with correct answers to just 7.5% of the questions on the test. K Prize runs offline with limited compute, so it favors smaller and open models. It levels the playing field. Konwinski has pledged $1 million to the first open source model that can score higher than 90% on the test. K Prize tests models against flagged issues from GitHub as a test of how well models can deal with real-world programming problems. But while SWE-Bench is based on a fixed set of problems that models can train against, the K Prize is designed as a “contamination-free version of SWE-Bench,” using a timed entry system to guard against any benchmark-specific training. For round one, models were due by March 12. The K Prize organizers then built the test using only GitHub issues flagged after that date. The 7.5% top score stands in marked contrast to SWE-Bench itself, which currently shows a 75% top score on its easier “Verified” test and 34% on its harder “Full” test. Konwinski still isn’t sure whether the disparity is due to contamination on SWE-Bench or just the challenge of collecting new issues from GitHub, but he expects the K Prize project to answer the question soon.

Read Article