• Menu
  • Skip to right header navigation
  • Skip to main content
  • Skip to primary sidebar

DigiBanker

Bringing you cutting-edge new technologies and disruptive financial innovations.

  • Home
  • Pricing
  • Features
    • Overview Of Features
    • Search
    • Favorites
  • Share!
  • Log In
  • Home
  • Pricing
  • Features
    • Overview Of Features
    • Search
    • Favorites
  • Share!
  • Log In

Anthropic’s open sourced AI safety tool Petri can automate and continuously audit GenAI models across 111 risky tasks; Claude Sonnet 4.5 emerged as the top-performing model

October 10, 2025 //  by Finnovate

Anthropic PBC is doubling down on AI safety with the release of Parallel Exploration Tool for Risky Interactions, or Petri, a new open-source tool that uses AI agents to audit the behavior of large language models. It’s designed to identify numerous problematic tendencies of models, such as deceiving users, whistleblowing, cooperation with human misuse and facilitating terrorism. Anthropic said that agentic tools like Petri can be useful because the complexity and variety of LLM behaviors exceeds the ability of researchers to test them for every kind of worrying scenario manually. As such, Petri represents a shift in the business of AI safety testing from static benchmarks to automated, ongoing audits that are designed to catch risky behavior, not only before models are released, but also once they’re out in the wild. Anthropic said that Claude Sonnet 4.5 emerged as the top-performing model across a range of “risky tasks” in its evaluations. Petri combines its testing agents with a judge model that ranks each LLM across various dimensions, including honesty and refusal. It will then flag any transcripts of conversations that resulted in risky outputs, so humans can review them. The tool is therefore suitable for developers wanting to conduct exploratory testing of new AI models, so they can improve their overall safety before they’re released to the public, Anthropic said. It significantly reduces the amount of manual effort required to evaluate models for safety, and by making it open-source, Anthropic says it hopes to make this kind of alignment research standard for all developers.

Read Article

Category: Essential Guidance

Previous Post: « Corelium’s COR token powers decentralized AI infrastructure with DAO governance, deflationary mechanisms through fee burning and revenue buybacks; supporting compute nodes from individual GPUs to enterprise HPC clusters
Next Post: Two TikTok influencers drive 15-fold CFPB complaint spike, as they sell $77 complaint templates containing false claims that Zelle and Cash App provide automatic reimbursement »

Copyright © 2025 Finnovate Research · All Rights Reserved · Privacy Policy
Finnovate Research · Knyvett House · Watermans Business Park · The Causeway Staines · TW18 3BA · United Kingdom · About · Contact Us · Tel: +44-20-3070-0188

We use cookies to provide the best website experience for you. If you continue to use this site we will assume that you are happy with it.