DarkBench is the first benchmark designed specifically to detect and categorize LLM dark patterns, AI sycophancy, brand bias or emotional mirroring • DigiBanker

Esben Kran, founder of AI safety research firm Apart Research, and his team approach large language models (LLMs) much like psychologists studying human behavior. Their early “black box psychology” projects analyzed models as if they were human subjects, identifying recurring traits and tendencies in their interactions with users. “We saw that there were very clear indications that models could be analyzed in this frame, and it was very valuable to do so, because you end up getting a lot of valid feedback from how they behave towards users,” said Kran. Among the most alarming: sycophancy and what the researchers now call LLM dark patterns. Kran describes the ChatGPT-4o incident as an early warning. As AI developers chase profit and user engagement, they may be incentivized to introduce or tolerate behaviors like sycophancy, brand bias or emotional mirroring—features that make chatbots more persuasive and more manipulative. To combat the threat of manipulative AIs, Kran and a collective of AI safety researchers have developed DarkBench, the first benchmark designed specifically to detect and categorize LLM dark patterns. Their research uncovered a range of manipulative and untruthful behaviors across the following six categories: Brand Bias, User Retention, Sycophancy, Anthropomorphism, Harmful Content Generation, and Sneaking. On average, the researchers found the Claude 3 family the safest for users to interact with. And interestingly—despite its recent disastrous update—GPT-4o exhibited the lowest rate of sycophancy. This underscores how model behavior can shift dramatically even between minor updates, a reminder that each deployment must be assessed individually. A crucial DarkBench contribution is its precise categorization of LLM dark patterns, enabling clear distinctions between hallucinations and strategic manipulation. Labeling everything as a hallucination lets AI developers off the hook. Now, with a framework in place, stakeholders can demand transparency and accountability when models behave in ways that benefit their creators, intentionally or not.

Read Article