Members Archives • Page 59 of 5216 • DigiBanker

DarkBench is the first benchmark designed specifically to detect and categorize LLM dark patterns, AI sycophancy, brand bias or emotional mirroring

May 16, 2025 // by Finnovate

Esben Kran, founder of AI safety research firm Apart Research, and his team approach large language models (LLMs) much like psychologists studying human behavior. Their early “black box psychology” projects analyzed models as if they were human subjects, identifying recurring traits and tendencies in their interactions with users. “We saw that there were very clear indications that models could be analyzed in this frame, and it was very valuable to do so, because you end up getting a lot of valid feedback from how they behave towards users,” said Kran. Among the most alarming: sycophancy and what the researchers now call LLM dark patterns. Kran describes the ChatGPT-4o incident as an early warning. As AI developers chase profit and user engagement, they may be incentivized to introduce or tolerate behaviors like sycophancy, brand bias or emotional mirroring—features that make chatbots more persuasive and more manipulative. To combat the threat of manipulative AIs, Kran and a collective of AI safety researchers have developed DarkBench, the first benchmark designed specifically to detect and categorize LLM dark patterns. Their research uncovered a range of manipulative and untruthful behaviors across the following six categories: Brand Bias, User Retention, Sycophancy, Anthropomorphism, Harmful Content Generation, and Sneaking. On average, the researchers found the Claude 3 family the safest for users to interact with. And interestingly—despite its recent disastrous update—GPT-4o exhibited the lowest rate of sycophancy. This underscores how model behavior can shift dramatically even between minor updates, a reminder that each deployment must be assessed individually. A crucial DarkBench contribution is its precise categorization of LLM dark patterns, enabling clear distinctions between hallucinations and strategic manipulation. Labeling everything as a hallucination lets AI developers off the hook. Now, with a framework in place, stakeholders can demand transparency and accountability when models behave in ways that benefit their creators, intentionally or not.

Read Article

Databricks acquisition of Neon to offer enterprises ability to deploy AI agents at scale by rapidly spinning up databases programmatically without coupling storage and compute needs, through a serverless autoscaling approach to PostgreSQL

May 19, 2025 // by Finnovate

Databricks announced its intent to acquire Neon, a leading serverless Postgres company. Neon’s serverless PostgreSQL approach separates storage and compute, making it developer-friendly and AI-native. It also enables automated scaling as well as branching in an approach that is similar to how the Git version control system works for code. Amalgam Insights CEO and Chief Analyst Hyoun Park noted that Databricks has been a pioneer in deploying and scaling AI projects. Park explained that Neon’s serverless autoscaling approach to PostgreSQL is important for AI because it allows agents and AI projects to grow as needed without artificially coupling storage and compute needs together. He added that for Databricks, this is useful both for agentic use cases and for supporting the custom models they have built over the last couple of years after its Mosaic AI acquisition. For enterprises looking to lead the way in AI, this acquisition signals a shift in infrastructure requirements for successful AI implementation. What is particularly insightful, though, is that the ability to rapidly spin up databases is essential for agentic AI success. The deal validates that even advanced data companies need specialized serverless database capabilities to support AI agents that create and manage databases programmatically. Organizations should recognize that traditional database approaches may limit their AI initiatives, while flexible, instantly scalable serverless solutions enable the dynamic resource allocation that modern AI applications demand. For companies still planning their AI roadmap, this acquisition signals that database infrastructure decisions should prioritize serverless capabilities that can adapt quickly to unpredictable AI workloads. This would transform database strategy from a technical consideration to a competitive advantage in delivering responsive, efficient AI solutions.

Read Article

Databricks acquisition of Neon to offer enterprises ability to deploy AI agents at scale by rapidly spinning up databases programmatically without coupling storage and compute needs, through a serverless autoscaling approach to PostgreSQL

May 19, 2025 // by Finnovate

Read Article

Windsurf’s new frontier-class AI models focus on specific engineering tasks as against LLMs that gear towards general-purpose coding; adopt ‘flow awareness’ that progressively transfer tasks from human to AI through a shared timeline of actions to accelerate the entire development lifecycle

May 19, 2025 // by Finnovate

To date, vibe coding platforms have largely relied on existing large language models (LLMs) to help write code. Windsurf is taking on the challenge with a series of new frontier AI models it calls SWE-1 (software engineer 1) as part of the company’s Wave 9 update. SWE-1 is a family of frontier-class AI models specifically designed to accelerate the entire software engineering process. Available immediately to Windsurf users, SWE-1 marks the company’s entry into frontier model development with performance competitive to established foundation models, but with a focus on software engineering workflows. Anshul Ramachandran, head of product and strategy at Windsurf, said, “The core innovation behind SWE-1 is Windsurf’s recognition that coding represents only a fraction of what software engineers actually do.” Rather than creating a one-size-fits-all solution, Windsurf has developed three specialized models: SWE-1; SWE-1-lite and SWE-1-mini. T he goal is to position SWE-1 as the first step toward purpose-built models that will eventually surpass general-purpose ones for specific engineering tasks — and potentially at a lower cost. What makes Windsurf’s approach technically distinctive is its implementation of the flow awareness concept. Flow awareness is centered on creating a shared timeline of actions between humans and AI in software development. The core idea is to progressively transfer tasks from human to AI by understanding where AI can most effectively assist. This approach creates a continuous improvement loop for the models. For enterprises building or maintaining software, SWE-1 represents an important evolution in AI-assisted development. Rather than treating AI coding assistants as simply autocomplete tools, this approach promises to accelerate the entire development lifecycle. The potential impact extends beyond just writing code more quickly. The recognition that application development is more involved will help mature the vibe coding paradigm to be more applicable for stable enterprise software development. If and when OpenAI completes the acquisition of Windsurf, the new models could become even more important as they intersect with the larger model research and development resources that will become available.

Read Article

Microsoft wants in-house AI ‘agents’ to work together with external agents in a collaboration and remember such interactions

May 19, 2025 // by Finnovate

Microsoft envisions a future where any company’s artificial intelligence agents can work together with agents from other firms and have better memories of their interactions, its chief technologist said on Sunday ahead of the company’s annual software developer conference. Microsoft is holding its Build conference in Seattle on May 19, where analysts expect the company to unveil its latest tools for developers building AI systems. Speaking at Microsoft’s headquarters in Redmond, Washington, ahead of the conference, Chief Technology Officer Kevin Scott told reporters and analysts the company is focused on helping spur the adoption of standards across the technology industry that will let agents from different makers collaborate. Agents are AI systems that can accomplish specific tasks, such as fixing a software bug, on their own.

Read Article

Microsoft wants in-house AI ‘agents’ to work together with external agents in a collaboration and remember such interactions

May 19, 2025 // by Finnovate

Read Article

Broadridge Financial Solutions awarded patent for its LLM orchestration of machine learning agents used in its AI bond trading platform; patented features include explainability of the output generated, compliance verification and user profile attributes

May 19, 2025 // by Finnovate

Broadridge Financial Solutions has been awarded a U.S. patent for its large language model orchestration of machine learning agents, which is used in BondGPT and BondGPT+. These applications provide timely, secure, and accurate responses to natural language questions using OpenAI GPT models and multiple AI agents. The BondGPT+ enterprise application integrates clients’ proprietary data, third-party datasets, and personalization features, improving efficiency and saving time for users. Broadridge continues to work closely with clients to integrate AI into their workflows. Other significant features patented in U.S. Patent No. 11,765,405 include: Explainability as to how the output of the patented methods of LLM orchestration of machine learning agents was generated through a “Show your work” feature that offers step-by-step transparency; A multi-agent adversarial feature for enhanced accuracy; and An AI-powered compliance verification feature, based on custom compliance rules configured to an enterprise’s unique compliance and risk management processes. The use of User Profile attributes such as user role to inform data retrieval and security.

Read Article

OpenAI’s new Codex carries out coding tasks in isolated software containers that don’t have web access and allows developers to customize those production environments, review the code and fix bugs with an accuracy rate of 75%

May 19, 2025 // by Finnovate

OpenAI debuted a new AI agent, Codex, that can help developers write code and fix bugs. The tool is available through a sidebar in ChatGPT’s interface. One button in the sidebar configures Codex to generate new code based on user instructions, while another allows it to answer questions about existing code. Prompt responses take between one and 30 minutes to generate based on the complexity of the request. Codex is powered by a new AI model called codex-1. It’s a version of o3, OpenAI’s most capable reasoning model, that has been optimized for programming tasks. The ChatGPT developer fine-tuned Codex by training it on a set of real-world coding tasks. Those tasks involved a range of software environments. A piece of software that runs well in one environment, such as a cloud platform, may not run as efficiently on a Linux server or a developer’s desktop, if at all. As a result, an AI model’s training dataset must include technical information about every environment that it will be expected to use. OpenAI used reinforcement learning to train codex-1. It’s a way of developing AI models that relies on trial and error to boost output quality. When a neural network completes a task correctly, it’s given a virtual reward, while incorrect answers lead to penalties that encourage the algorithm to come up with a better approach. In a series of coding tests carried out by OpenAI, Codex achieved an accuracy rate of 75%. That’s 5% better than the most capable, hardware-intensive version of o3. OpenAI’s first-generation reasoning model, o1, scored 11%. Codex carries out coding tasks in isolated software containers that don’t have web access. According to OpenAI, the agent launches a separate container for each task. Developers can customize those development environments by uploading a text file called AGENTS.md. The file may describe what programs Codex should install, how AI-generated code should be tested for bugs and related details. Using AGENTS.md, developers can ensure that the container in which Codex generates code is configured the same way as the production system on which the code will run. That reduces the need to modify the code before releasing it to production. Developers can monitor Codex while it’s generating code. After the tool completes a task, it provides technical data that can be used to review each step of the workflow. It’s possible to request revisions if the code doesn’t meet project requirements.

Read Article

Proposed amendments to the GENIUS Act to include “robust financial controls” and stringent measures around consumer protection, bankruptcy and ethics for private stablecoin issuers such as tech companies and bans on issuers on promoting yield or interest-bearing features

May 19, 2025 // by Finnovate

As U.S. lawmakers circulate an updated draft agreement on the GENIUS Act, an acronym for Guiding and Establishing National Innovation for U.S. Stablecoins of 2025 Act, that could all be about to change due to the potential emergence of domestic regulatory clarity around dollar-backed stablecoins. Senate Democrats are now warning that the bill, as originally drafted, could inadvertently open the floodgates to corruption, foreign threats and a new era of unregulated digital finance. Democratic lawmakers are asking for amendments to be made around consumer protection, bankruptcy and ethics, as well as “robust financial controls” for private stablecoin issuers, such as tech companies. Ultimately, whether the GENIUS Act becomes law, and in what form, could redefine the future of finance in America. The regulatory framework offers the promise of clarity and the peril of loopholes alike, as well as the challenge of reconciling innovation with oversight. The updated GENIUS Act bill explicitly ensures that existing laws enforced by the Consumer Financial Protection Bureau (CFPB) and the Federal Trade Commission (FTC) remain applicable to stablecoin issuers, and prevents the new regulatory regime from becoming a loophole for evading securities laws. Issuers will also face strict bans on promoting yield or interest-bearing features — a move designed to curb risks akin to those that triggered past collapses in the crypto lending space. Additionally, naming restrictions will prevent companies from using terms like “United States” or “USG” in product branding, reducing the risk of misleading consumers about government backing. Issuers located in countries under comprehensive U.S. sanctions — or deemed money laundering risks — are barred from operating in the U.S. market, closing potential backdoors for illicit finance. Democrats also secured tough restrictions on non-financial publicly traded companies — namely tech giants like Meta Platforms Inc. and Amazon.com Inc. — from issuing their own stablecoins unless they meet rigorous standards. The language aims to preserve the separation between commerce and banking, a long-held policy pillar that critics argue could be undermined by digital assets.

Read Article

Webull Pay to use Coinbase’s institutional grade Crypto-as-a-Service platform to offer users staking capabilities, stablecoin rewards, custody, trading execution and access to USDC

May 19, 2025 // by Finnovate

Webull Pay partnered with Coinbase in a deal that enables Webull Pay’s crypto services to run on Coinbase’s institutional-grade infrastructure. The agreement aims to offer staking, stablecoin rewards, and more trading options starting next month. Coinbase will provide its Crypto-as-a-Service (CaaS) platform to support Webull Pay’s crypto operations. The agreement also covers trading execution, custody, staking capabilities, and access to USDC, Coinbase’s dollar-backed stablecoin. For Webull Pay, the move delivers a critical backend upgrade using infrastructure already used by major financial institutions. The companies now aim to offer a secure, seamless user experience, which is expected to allow Webull Pay to scale with the evolving crypto market. The platform expects the new offering to enable users to gain access to deep liquidity, tight spreads, and the potential for yield through staking and USDC rewards. Beyond the domestic rollout, Coinbase and Webull Pay are also exploring joint efforts to extend their services globally. That would bring Webull Pay-branded crypto offerings to new markets, riding on Coinbase’s existing global infrastructure and compliance frameworks. The deal reportedly includes access to Coinbase’s USDC rewards program. Users who hold USDC through Webull Pay will automatically be enrolled in the loyalty scheme unless they opt out.

Read Article

Members

Join Us