• Menu
  • Skip to right header navigation
  • Skip to main content
  • Skip to primary sidebar

DigiBanker

Bringing you cutting-edge new technologies and disruptive financial innovations.

  • Home
  • Pricing
  • Features
    • Overview Of Features
    • Search
    • Favorites
  • Share!
  • Log In
  • Home
  • Pricing
  • Features
    • Overview Of Features
    • Search
    • Favorites
  • Share!
  • Log In

Anthropic and OpenAI run first cross‑lab safety tests: o3 and o4‑mini align strongly, GPT‑4o/4.1 show misuse concerns, and all models exhibit varying sycophancy under stress

August 29, 2025 //  by Finnovate

AI startups Anthropic and OpenAI said that they evaluated each other’s public models, using their own safety and misalignment tests. Sharing this news and the results in separate blog posts, the companies said they looked for problems like sycophancy, whistleblowing, self-preservation, supporting human misuse and capabilities that could undermine AI safety evaluations and oversight. OpenAI wrote in its post that this collaboration was a “first-of-its-kind joint evaluation” and that it demonstrates how labs can work together on issues like these. Anthropic wrote in its post that the joint evaluation exercise was meant to help mature the field of alignment evaluations and “establish production-ready best practices.” Reporting the findings of its evaluations, Anthropic said OpenAI’s o3 and o4-mini reasoning models were aligned as well or better than its own models overall, the GPT-4o and GPT-4.1 general-purpose models showed some examples of “concerning behavior,” especially around misuse, and both companies’ models struggled to some degree with sycophancy. OpenAI wrote in its post that it found that Anthropic’s Claude 4 models generally performed well on evaluations stress-testing their ability to respect the instruction hierarchy, performed less well on jailbreaking evaluations that focused on trained-in safeguards, generally proved to be aware of their uncertainty and avoided making statements that were inaccurate, and performed especially well or especially poorly on scheming evaluation, depending on the subset of testing. Both companies said in their posts that for the purpose of testing, they relaxed some model-external safeguards that otherwise would be in operation but would interfere with the tests. They each said that their latest models, OpenAI’s GPT-5 and Anthropic’s Opus 4.1, which were released after the evaluations, have shown improvements over the earlier models.

Read Article

 

Category: Cybersecurity, Innovation Topics

Previous Post: « Embedded payments are seeing rising adoption in the parking sector through AI-recognition tech that lets customers just drive in and scan a QR code to enter their credit card information the first time they park, with automatic vehicle identification and charges applied on subsequent trips

Copyright © 2025 Finnovate Research · All Rights Reserved · Privacy Policy
Finnovate Research · Knyvett House · Watermans Business Park · The Causeway Staines · TW18 3BA · United Kingdom · About · Contact Us · Tel: +44-20-3070-0188

We use cookies to provide the best website experience for you. If you continue to use this site we will assume that you are happy with it.