Anthropic’s new Claude Opus 4.1 model scores 74.5% on SWE-bench Verified, surpassing OpenAI’s o3 model at 69.1% and Google’s Gemini 2.5 Pro at 67.2%, indicating its dominance in AI-powered coding assistance • DigiBanker

Anthropic unveiled the latest version of its flagship artificial intelligence model, the same day that OpenAI released its first two open reasoning models since 2019. Claude Opus 4.1 is better at agentic tasks, coding and reasoning, according to a company blog post. Leaks of Claude Opus 4.1 began appearing the day before on social platform X and TestingCatalog. Anthropic Chief Product Officer Mike Krieger said this release is different from previous model unveilings. Claude Opus 4.1 is a successor to Claude Opus 4, which launched May 22. Opus 4.1 shows gains on benchmarks such as SWE-Bench Verified, a coding evaluation test, where it scores two percentage points higher than the previous model. The 4.1 model is also strong in agentic terminal coding, with a score of 43.3% on the Terminal-Bench benchmark compared with 39.2% for Opus 4, 30.2% for OpenAI’s o3, and 25.3% for Google’s Gemini 2.5 Pro. Customers such as Windsurf, a coding app being acquired by Cognition, and Japan’s Rakuten Group have reported quicker and more accurate completion of coding tasks using Claude Opus 4.1. The Claude Opus 4.1 release came amid signs that rival OpenAI is nearing the debut of GPT-5

Read Article