AI & Machine Economy Archives • Page 13 of 461 • DigiBanker

Microsoft plans to rank AI models by safety based on its own ToxiGen benchmark, which measures implicit hate speech, and the Center for AI Safety’s Weapons of Mass Destruction Proxy benchmark.

June 9, 2025 // by Finnovate

Microsoft will start ranking AI models based on their safety performance, as the software group seeks to build trust with cloud customers as it sells them AI offerings from the likes of OpenAI and Elon Musk’s xAI. Sarah Bird, Microsoft’s head of Responsible AI, said the company would soon add a “safety” category to its “model leaderboard”, a feature it launched for developers this month to rank iterations from a range of providers including China’s DeepSeek and France’s Mistral. The leaderboard, which is accessible by tens of thousands of clients using the Azure Foundry developer platform, is expected to influence which AI models and applications are purchased through Microsoft. The new safety ranking would ensure “people can just directly shop and understand” AI models’ capabilities as they decide which to purchase. Microsoft’s new safety metric will be based on its own ToxiGen benchmark, which measures implicit hate speech, and the Center for AI Safety’s Weapons of Mass Destruction Proxy benchmark. The latter assesses whether a model can be used for malicious purposes such as building a biochemical weapon. Rankings enable users to have access to objective metrics when selecting from a catalogue of more than 1,900 AI models, so that they can make an informed choice of which to use.

Read Article

Non-profit EleutherAI releases massive AI training dataset of licensed and open domain text created in consultation with legal experts and claims performing on par with models developed using unlicensed, copyrighted data

June 10, 2025 // by Finnovate

EleutherAI, an AI research organization, has released what it claims is one of the largest collections of licensed and open-domain text for training AI models called the Common Pile v0.1. Weighing in at 8 terabytes in size, the Common Pile v0.1 was used to train two new AI models from EleutherAI, Comma v0.1-1T and Comma v0.1-2T, that EleutherAI claims perform on par with models developed using unlicensed, copyrighted data. The Common Pile v0.1, which can be downloaded from Hugging Face’s AI dev platform and GitHub, was created in consultation with legal experts, and it draws on sources, including 300,000 public domain books digitized by the Library of Congress and the Internet Archive. EleutherAI also used Whisper, OpenAI’s open source speech-to-text model, to transcribe audio content. EleutherAI claims Comma v0.1-1T and Comma v0.1-2T are evidence that the Common Pile v0.1 was curated carefully enough Non-profit alternatives. According to EleutherAI, the models, both of which are 7 billion parameters in size and were trained on only a fraction of the Common Pile v0.1, rival models like Meta’s first Llama AI model on benchmarks for coding, image understanding, and math.

Read Article

Amperity vibe coding AI agent connects directly to the customer’s Databricks environment via native compute and LLM endpoints to quickly execute complex tasks such as identity resolution

June 10, 2025 // by Finnovate

Customer data cloud startup Amperity Inc. is joining the agentic AI party, launching Chuck Data, an AI agent that specializes in customer data engineering. Chuck Data is trained on massive volumes of customer data from more than 400 enterprise brands. This “critical knowledge” base allows it to execute tasks such as identity resolution and personally identifiable information tagging autonomously and instantly resolve customer identities, with minimal input from human developers. The agent is designed to help companies dig up customer insights much faster. Chuck Data makes it possible for data engineers to embrace “vibe coding,” so they can use natural language prompts to delegate these manual coding tasks to an autonomous AI assistant. The company said Chuck Data connects directly to the customer’s Databricks environment via native compute and large language model endpoints. Then it can quickly execute complex tasks such as identity resolution – which involves pulling data from multiple profiles into one – as well as compliance tagging and data profiling. One of Chuck Data’s core features is Amperity’s patented identity resolution algorithm, which is based on the proprietary Stitch technology that’s used within its flagship cloud data platform. The company said users can run Stitch on up to 1 million customer records for free, and for those with bigger records, they can sign up to Chuck Data’s research preview program to access free credits. It’s also offering paid plans that unlock unlimited access to Stitch, enabling companies to create millions of accurate, scalable customer profiles. huck Data provides yet more evidence of how CDPs are evolving from activation tools into embedded intelligence layers for the customer engagement data value chain.

Read Article

Hirundo’s approach to AI hallucinations is about making fully trained AI models forget the bad things they learn, so they can’t use this mistaken knowledge

June 10, 2025 // by Finnovate

Hirundo AI Ltd., a startup that’s helping AI models “forget” bad data that causes them to hallucinate and generate bad responses, has raised $8 million in seed funding to popularize the idea of “machine unlearning.” Hirundo’s approach to AI hallucinations is about making fully trained AI models forget the bad things they learn, so they can’t use this mistaken knowledge to generate their responses later on, down the line. It does this by studying the behavior of AI models in order to locate the directions users can go in order to manipulate them. It identifies any bad traits, then investigates the root cause of those bad outputs, before steering the model away from them. It pinpoints where hallucinations originate from in the billions of parameters that make up their knowledge base. This retroactive approach to fixing undesirable behaviors and inaccuracies in AI models means it’s possible to improve their accuracy and reliability without needing to retrain them. That’s a big deal, because retraining models can take many weeks and cost thousands or even millions of dollars. “With Hirundo, models can be remediated instantly at their core, working toward fairer and more accurate outputs,” Chief Executive Ben Luria added. Besides helping models to forget bad, biased or skewed data, the startup says it can also make them “unlearn” confidential information, preventing AI models from revealing secrets that shouldn’t be shared. What’s more, it can do this for both open-source models such as Llama and Mistral, and soon it will also be able to do the same for gated models such as OpenAI’s GPT and Anthropic PBC’s Claude. The startup says it has successfully managed to remove up to 70% of biases from DeepSeek Ltd.’s open-source R1 model. It has also tested its software on Meta Platforms Inc.’s Llama, reducing hallucinations by 55% and successful prompt injection attacks by 85%.

Read Article

ChatGPT is the most adopted general-purpose model by developers accounting for more than 86% of all LLM tokens processed followed by Meta’s Lama

June 11, 2025 // by Finnovate

New Relic released its inaugural AI Unwrapped: 2025 AI Impact Report, offering a view into how developer choices are transforming the AI ecosystem. Drawing from comprehensive aggregated and de-identified usage data from 85,000 active New Relic customers over a year, the report reveals that developers are overwhelmingly embracing the largest general-purpose models, led by OpenAI’s ChatGPT, which accounted for more than 86% of all LLM tokens processed by New Relic customers. The data shows ChatGPT-4o has been dominating more recently, followed by ChatGPT-4o mini. However, adoption of ChatGPT from version-to-version is occurring seemingly overnight as developers pivot toward newer, better, faster, and cheaper models. New Relic users have been rapidly shifting from ChatGPT-3.5 Turbo to ChatGPT-4.1 mini since it was announced in April. This shows that developers value cutting-edge performance and features more than savings. In a countervailing trend, the findings also highlight increased model diversification as developers explore open-source alternatives, specialized domain solutions, and task-specific models, although at a smaller scale. Meta’s Llama emerged as the model that saw the second largest amount of LLM tokens processed by New Relic customers. In fact, New Relic saw a 92% increase in the number of unique models used across AI apps in the first quarter of 2025. Since its launch last year, enterprises have been adopting New Relic AI Monitoring at a steady 30% growth in usage quarter-over-quarter in the previous 12 months, giving them a solution to ensure AI model reliability, accuracy, compliance, and cost efficiency.

Read Article

Uniphore’s solution unifies agents, models, knowledge, and data into a single, composable platform and offers, is interoperable with both closed- and open-source LLMs and offers pre-built enterprise-grade agents

June 11, 2025 // by Finnovate

Uniphore has launched the Uniphore Business AI Cloud: a sovereign, composable, and secure platform that bridges the “AI divide” between IT and business users by combining the simplicity of consumer AI with enterprise-grade security and scalability. Uniphore’s Business AI Cloud empowers both CIOs and business users by unifying agents, models, knowledge, and data into a single, composable platform. This balance of usability and rigor unlocks the true promise of AI, not just as a technological upgrade, but as a transformative force for business. Data Layer: A zero-copy, composable data fabric that connects to any platform, application, or cloud – querying and preparing data where it lives to eliminate migrations and accelerate AI adoption. Knowledge Layer: Structures and contextualizes enterprise data into AI-ready knowledge retrieval, enabling proprietary SLM fine-tuning. Perpetual fine-tuning, and unlocking deep, explainable insights across domains. Model Layer: Open and interoperable with both closed- and open-source LLMs, allowing enterprises to apply guardrails and governance to models, as well as orchestrate and swap models without rework as technologies evolve. Agentic Layer: Offers pre-built enterprise-grade agents and a natural language agent builder, plus Business Process Model and Notation (BPMN) based orchestration for deploying AI into real workflows across sales, marketing, service, HR, and more. The Business AI Cloud was purpose-built to address the four biggest blockers to enterprise AI adoption: The Data Layer Bottleneck, Data Sovereignty, Disconnected AI Ownership Between IT and Business, Rip-and-Replace Requirements.

Read Article

OpenAI’s latest o3-pro AI model rated consistently higher in key domains like science, education, programming, business, and writing help for clarity, comprehensiveness, instruction-following, and accuracy

June 11, 2025 // by Finnovate

OpenAI has launched o3-pro, an AI model that the company claims is its most capable yet. “In expert evaluations, reviewers consistently prefer o3-pro over o3 in every tested category and especially in key domains like science, education, programming, business, and writing help,” OpenAI writes in a changelog. “Reviewers also rated o3-pro consistently higher for clarity, comprehensiveness, instruction-following, and accuracy.” O3-pro has access to tools, according to OpenAI, allowing it to search the web, analyze files, reason about visual inputs, use Python, personalize its responses leveraging memory, and more. As a drawback, the model’s responses typically take longer than o1-pro to complete, according to OpenAI. O3-pro has other limitations. Temporary chats with the model in ChatGPT are disabled for now while OpenAI resolves a “technical issue.” O3-pro can’t generate images. And Canvas, OpenAI’s AI-powered workspace feature, isn’t supported by o3-pro. On the plus side, o3-pro achieves impressive scores in popular AI benchmarks. On AIME 2024, which evaluates a model’s math skills, o3-pro scores better than Google’s top-performing AI model, Gemini 2.5 Pro. O3-pro also beats Anthropic’s recently released Claude 4 Opus on GPQA Diamond, a test of PhD-level science knowledge. O3-pro is priced at $20 per million input tokens and $80 per million output tokens in the API. Input tokens are tokens fed into the model, while output tokens are tokens that the model generates based on the input tokens.

Read Article

Vanta’s AI agent auto-maps policies to relevant compliance controls by scanning uploaded documents and extracting key details including version history and SLAs, while providing rationale for its recommendations

June 12, 2025 // by Finnovate

Compliance automation startup Vanta unveiled an autonomous AI agent that handles end-to-end security and compliance workflows without human intervention. Unlike traditional automation tools that follow pre-defined rules, the Vanta AI Agent proactively identifies compliance issues, suggests fixes and takes action on behalf of security teams while keeping humans in control of final decisions. “By minimizing human error and taking on repetitive tasks, the Vanta AI agent enables teams to focus on higher-value work—the work that truly builds trust. The AI Agent tackles four critical areas that typically consume hundreds of hours of manual work. For policy onboarding, the system scans uploaded documents, extracts key details including version history and service level agreements, and automatically maps policies to relevant compliance controls while providing rationale for its recommendations. The AI Agent reviews uploaded documents against audit requirements to ensure accuracy and completeness, identifying gaps before they become issues. Perhaps most significantly, the agent proactively monitors for inconsistencies between written policies and actual practices—a common source of audit failures. The system also functions as an intelligent knowledge base, answering complex policy questions in real time. Looking ahead, the agent will support end-to-end compliance workflows by connecting all aspects of a customer’s program across the Vanta Trust Management Platform, including risk oversight and security reviews. This comprehensive approach could fundamentally alter how enterprises approach security and compliance management.

Read Article

Databricks unveiled Agent Bricks, a unified workspace that automates agent building and optimization; includes automated “judges” to prevent situations like an agent recommending a rival’s product

June 12, 2025 // by Finnovate

Databricks unveiled Agent Bricks, a unified workspace that automates agent building and optimization using customers’ enterprise data and synthetic equivalents. A key part of the release involves the use of large language model automated “judges” that generate questions and expected answers to assess model performance. This is presumably to resolve situations such as the one described by Ali Ghodsi, co-founder and chief executive, where one automaker expressed concern to him about an agent that was recommending a competitor’s cars. The company also released Lakebase, a managed Postgres database built for AI, added an operational layer to the firm’s Data Intelligence Platform. Lakebase builds on the company’s acquisition of Neon Inc. Neon’s serverless PostgreSQL platform allows developers to add support for data structures where AI models keep information. Another offering called Lakeflow Designer, is a no-code capability allows users to author production data pipelines by using a drag-and-drop interface and a natural language generative AI assistant. It’s the latest entry in the field of “vibe coding.” Through tools such as Agent Bricks and Lakebase, Databricks is building the infrastructure to support this change in how software is created and deployed.

Read Article

Databricks brings data insights to every business worker with AI-powered BI- generating the required SQL code itself, and executes it on the customer’s data warehouse, abstracting the complexity away from the user

June 12, 2025 // by Finnovate

Databricks is trying to bring the power of big data analytics to every business worker with the launch of its new AI-powered business intelligence tool, Databricks One. Its simplified allows users to describe the type of data analysis they want to perform. Then, an LLM performs the necessary technical work to get that analysis done. It can take actions such as deploying AI agents into data pipelines and databases to perform extremely specific and detailed analysis. It generates the required SQL code itself, and executes it on the customer’s data warehouse, abstracting the complexity away from the user. Once the analysis is done, Databricks One will show the results via suitable visualizations that appear directly in its interface. Users can then dig into these visualizations with an “AI/BI Genie,” and ask more detailed questions using natural language. The use cases are varied. For instance, marketing professionals might want to perform some analytics to see how effective their latest campaign has been, while legal professionals might want to review any overlapping business contracts that could conflict with one another. Salespeople could use it to gather every piece of information they need ahead of a meeting with a new lead.

Read Article

AI & Machine Economy

Join Us