• Menu
  • Skip to right header navigation
  • Skip to main content
  • Skip to primary sidebar

DigiBanker

Bringing you cutting-edge new technologies and disruptive financial innovations.

  • Home
  • Pricing
  • Features
    • Overview Of Features
    • Search
    • Favorites
  • Share!
  • Log In
  • Home
  • Pricing
  • Features
    • Overview Of Features
    • Search
    • Favorites
  • Share!
  • Log In

Anthropic and OpenAI run first cross‑lab safety tests: o3 and o4‑mini align strongly, GPT‑4o/4.1 show misuse concerns, and all models exhibit varying sycophancy under stress

August 29, 2025 //  by Finnovate

AI startups Anthropic and OpenAI said that they evaluated each other’s public models, using their own safety and misalignment tests. Sharing this news and the results in separate blog posts, the companies said they looked for problems like sycophancy, whistleblowing, self-preservation, supporting human misuse and capabilities that could undermine AI safety evaluations and oversight. OpenAI wrote in its post that this collaboration was a “first-of-its-kind joint evaluation” and that it demonstrates how labs can work together on issues like these. Anthropic wrote in its post that the joint evaluation exercise was meant to help mature the field of alignment evaluations and “establish production-ready best practices.” Reporting the findings of its evaluations, Anthropic said OpenAI’s o3 and o4-mini reasoning models were aligned as well or better than its own models overall, the GPT-4o and GPT-4.1 general-purpose models showed some examples of “concerning behavior,” especially around misuse, and both companies’ models struggled to some degree with sycophancy. OpenAI wrote in its post that it found that Anthropic’s Claude 4 models generally performed well on evaluations stress-testing their ability to respect the instruction hierarchy, performed less well on jailbreaking evaluations that focused on trained-in safeguards, generally proved to be aware of their uncertainty and avoided making statements that were inaccurate, and performed especially well or especially poorly on scheming evaluation, depending on the subset of testing. Both companies said in their posts that for the purpose of testing, they relaxed some model-external safeguards that otherwise would be in operation but would interfere with the tests. They each said that their latest models, OpenAI’s GPT-5 and Anthropic’s Opus 4.1, which were released after the evaluations, have shown improvements over the earlier models.

Read Article

 

Category: Cybersecurity, Innovation Topics

Previous Post: « Pangea’s AI Guardrail Platform gives enterprises runtime control without slowing development velocity
Next Post: Circle and Paxos pilot “know‑your‑issuer” with Bluprynt to trace tokens to verified issuers, curbing counterfeit stablecoins and aiding auditors and regulators amid new U.S. rules »

Copyright © 2025 Finnovate Research · All Rights Reserved · Privacy Policy
Finnovate Research · Knyvett House · Watermans Business Park · The Causeway Staines · TW18 3BA · United Kingdom · About · Contact Us · Tel: +44-20-3070-0188

We use cookies to provide the best website experience for you. If you continue to use this site we will assume that you are happy with it.