Qlik announced the launch of Qlik Open Lakehouse, a fully managed Apache Iceberg solution built into Qlik Talend Cloud. Designed for enterprises under pressure to scale faster and spend less, Qlik Open Lakehouse delivers real-time ingestion, automated optimization, and multi-engine interoperability — without vendor lock-in or operational overhead. Qlik Open Lakehouse offers a new path: a fully managed lakehouse architecture powered by Apache Iceberg that delivers 2.5x–5x faster query performance and up to 50% lower infrastructure costs, while maintaining full compatibility with the most widely used analytics and machine learning engines. Qlik Open Lakehouse combines real-time ingestion, intelligent optimization, and true ecosystem interoperability in a single, fully managed platform: Real-time ingestion at enterprise scale; Intelligent Iceberg optimization, fully automated; Open by design, interoperable by default; Your compute, your cloud, your rules; One platform, end to end. As AI workloads demand faster access to broader, fresher datasets, open formats like Apache Iceberg are becoming the new foundation. Qlik Open Lakehouse responds to this shift by making it effortless to build and manage Iceberg-based architectures — without the need for custom code or pipeline babysitting. It also runs within the customer’s own AWS environment, ensuring data privacy, cost control, and full operational visibility.
TELUS Digital’s off-the-shelf STEM datasets including coding and reasoning data are curated by diverse pool of experts to offer enterprises access to high-quality, AI-ready data that has been cleaned, labeled and formatted
A new TELUS Digital survey of 1,000 U.S. adults found that 87% respondents (up from 75% in 2023) believe companies should be transparent about how they source data for GenAI models. Additionally, 65% believe that the exclusion of high-quality, verified content, such as information from trusted media sources (e.g. New York Times, Reuters, Bloomberg), can lead to inaccurate and/or biased large language model (LLM) responses. “As AI systems become more specialized and embedded in high-stakes use cases, the quality of the datasets used to optimize outputs is emerging as a key differentiator for enterprises between average performance and having the potential to drive real-world impacts,” said Amith Nair, Global VP and General Manager, Data & AI Solutions, TELUS Digital. “We’re well past the era where general crowdsourced or internet data can meet today’s enterprises’ more complex and specialized use cases. This is reflected in the shift in our clients’ requests from ‘wisdom of the crowd’ datasets to ‘wisdom of the experts’. Experts and industry professionals are helping curate such datasets to ensure they are technically sound, contextually relevant and responsibly built. In high-stakes domains like healthcare or finance, even a single mislabelled data point can distort model behavior in ways that are difficult to detect and costly to correct.“ In response to evolving industry dynamics, TELUS Digital Experience has launched 13 off-the-shelf STEM (science, technology, engineering and mathematics) datasets, including coding and reasoning data that is critical for LLM advancements. The datasets have been expertly-curated by a diverse pool of contributors, including Ph.D. researchers, professors, graduate students and working professionals from around the world. This gives enterprises access to high-quality data that has been cleaned, labeled and formatted for immediate integration into AI training workflows.
Elastic’s new plugin to accelerate open-source vector search index build times and queries on Nvidia GPUs; integrates with Nvidia validated designs to enable on-premises AI agents
Elastic announced that Elasticsearch integrates with the new NVIDIA Enterprise AI Factory validated design to provide a recommended vector database for enterprises to build and deploy their own on-premises AI factories. Elastic will use NVIDIA cuVS to create a new Elasticsearch plugin that will accelerate vector search index build times and queries. NVIDIA Enterprise AI Factory validated designs enable Elastic customers to unlock faster, more relevant insights from their data. Elasticsearch is used throughout the industry for vector search and AI applications, with a thriving open source community. Elastic’s investment to accelerate vector search on GPUs builds upon previous longstanding efforts to optimize its vector database performance through hardware-accelerated CPU SIMD instructions, new vector data compression innovations like Better Binary Quantization and making Filtered HNSW faster. With Elasticsearch and the NVIDIA Enterprise AI Factory reference design, enterprises can unlock deeper insights and deliver more relevant, real-time information to AI agents and generative AI applications.
Starburst Data’s lakehouse model supports AI models by using data where it already lives without needing to copy it into a centralized repository and without requiring external data pipelines
Starburst Data is unveiling a suite of enhancements intended to make it easier for enterprises to develop and apply artificial intelligence models. Starburst’s updates are focused on enabling what it calls an AI “lakeside,” in which companies can use data where it already lives without needing to copy it into a centralized repository. Starburst defines a lakeside as a staging ground for AI, or an area adjacent to the data lakehouse where data is the most complete, cost-efficient and governed. The company’s new Lakeside AI architecture combines AI-ready tools with an open data lakehouse model. It allows companies to experiment with, train and deploy AI systems while keeping sensitive or regulated data in place. Starburst AI Workflows accelerates AI application development by making it easier to transform unstructured data into vector embeddings, a machine learning technique that turns data into numerical representations that capture the meaning and relationships between different data points without requiring explicit keywords. Workflows manage prompts and models with SQL and enforce governance policies. Starburst said these capabilities are fully contained within its platform and require no external data pipelines. Data is stored on Apache Iceberg tables with connectors available for a variety of third-party vector databases. Basically, this means users can build AI features that rely on unstructured or semi-structured sources like emails, documents and logs without having to move data or stitch together multiple tools. The Starburst AI Agent is a built-in natural language interface that allows users to talk to their data using natural language. It automatically scans for sensitive data such as names, email addresses and other personally identifiable information at the column level and tags it so access policies can be applied. That reduces the need for manual checks and helps organizations enforce privacy rules more consistently. A new Starburst data catalog replaces the aging Hive metastore and provides better support for the Iceberg data format that is rapidly becoming the standard for cloud data lakes. The new catalog supports both legacy Hive data and Iceberg tables. To improve performance across large-scale deployments, Starburst is also introducing a native ODBC Driver that improves connection speed and reliability with business intelligence tools such as Salesforce Inc.’s Tableau and Microsoft Corp.’s Power BI.
Ataccama enhances data lineage with audit-ready snapshots, historical tracking, and cloud-native processing to strengthen data trust
Ataccama has released Ataccama ONE data trust platform v16.1. This new version introduces powerful data lineage and connectivity capabilities, including enhanced diagram export for audit and compliance use cases and improved lineage visualization tools. It also expands pushdown processing for cloud platforms, such as Azure Synapse and Google BigQuery. With these updates, Ataccama helps organizations more easily operationalize automated lineage, govern data across complex environments, and deliver trusted insights at scale. The Ataccama ONE data trust platform closes the data trust gap by giving organizations a comprehensive and portable view of how data moves, transforms, and impacts downstream systems. New capabilities make it easier to manage lineage across environments, including exporting diagrams for audits, preserving historical lineage states, and migrating metadata to support governance workflows and system changes. Teams can go beyond static data views to track sensitive information, audit its handling, and build confidence with point-in-time documentation. Expanded pushdown processing allows organizations to analyze data directly within cloud platforms like Azure Synapse and BigQuery, reducing movement, improving performance, and maintaining governance at scale. These updates enable teams to act faster, meet regulatory requirements, and confidently deliver trusted insights. New capabilities in v16.1: Automated lineage and audit snapshots; Enhanced visibility and collaboration; Cloud-native data processing; Support for big data workloads; Enhanced connectivity and flexibility.
Salesforce’s acquisition of Informatica to elevate its data management capabilities including catalog, integration, governance, quality, privacy and Master Data Management (MDM) for deploying agentic AI
Salesforce is making a big bid to become a much larger player in the enterprise space, announcing an $8B acquisition of Informatica. The move will bring together two large, established enterprise software providers with decades of real-world experience. By acquiring Informatica, Salesforce aims to enhance its trusted data foundation for deploying agentic AI. The combination will create a unified architecture enabling AI agents to operate safely, responsibly and at scale across enterprises by integrating: Informatica’s rich data catalog, integration, governance, quality, privacy and Master Data Management (MDM) capabilities; Salesforce’s platform includes Data Cloud, Agentforce, Tableau, MuleSoft and Customer 360. According to Forrester Analyst Noel Yuhanna, Salesforce’s acquisition of Informatica fills a gap in its data management capabilities. “The acquisition markedly elevates Salesforce’s position across all critical dimensions of modern data management, including data integration, ingestion, pipelines, master data management (MDM), metadata management, transformation, preparation, quality and governance in the cloud,” Yuhanna told VentureBeat. “These capabilities are no longer optional—they are foundational for building an AI-ready enterprise, especially as the industry accelerates toward agentic AI.” To fully realize AI’s promise, Yuhanna said that vendor solutions must tightly integrate data and AI as two sides of the same coin. In his view, this acquisition strengthens Salesforce’s ability to do just that, laying the groundwork for next-generation data that can power intelligent, autonomous and personalized experiences at scale to support AI use cases. Yuhanna sees the acquisition as a major advancement for Salesforce customers. He noted that Salesforce customers will be able to seamlessly access and leverage all types of customer data, whether housed within Salesforce or external systems, all in real time. It represents a unified customer data fabric that can deliver actionable insights across every channel and touchpoint. “Critically, it accelerates Salesforce’s ability to deploy agentic AI, enabling low-code, low-maintenance AI solutions that reduce complexity and dramatically shorten time to value,” Yuhanna said. “With a fully integrated data management foundation, Salesforce customers can expect faster, more innovative, and more personalized customer experiences at scale.” The opportunity is equally appealing for Informatica customers. In Yuhanna’s view, this acquisition unlocks a faster path to agentic AI workloads, backed by the reach and power of the Salesforce ecosystem. As data management evolves, intelligent agents will automate core functions, turning traditionally time-consuming processes like data ingestion, integration, and pipeline orchestration into self-operating data workflows. Tasks that once took days or weeks will be executed with zero to little human intervention. “With a unified data, AI, and analytics platform, Informatica customers will benefit from accelerated innovation, greater operational agility, and significantly enhanced returns on their data investments,” he said.
Regula Rolls privacy compliance tool allows document experts to blur or hide PII directly within forensic workflows
Regula, a global identity verification solution developer, has added personal data masking functionality to its Regula Forensic Studio (RFS) software. This feature allows document experts to protect personal data with a single click, meeting growing privacy demands without disrupting workflows. The Regula ecosystem, from real-time ID verification to in-depth forensic analysis, now supports robust privacy controls natively. The new capability allows document experts to blur or hide personally identifiable information (PII) directly within forensic workflows, ensuring sensitive data is handled responsibly while meeting global requirements. In addition to the personal data masking feature, the latest RFS release includes 40+ updates focused on speed, customization, and forensic precision: New analysis tools: Yellow dot analysis for tracing document origins and detecting unauthorized duplicates. Smarter imaging: Per-light-source gamma correction and full-spectrum HDR imaging (not just UV), improving clarity across all materials. Streamlined collaboration: Video screen capture and camera recording capabilities support team training and case reviews. Faster insights: Hyperspectral imaging is now 20% faster without compromising detail. Improved digital zoom: Expanded up to 16x for detailed inspections. Visual reporting: Ability to generate composite images under varied lighting, ideal for expert reports or courtroom presentations. Integrated workflows: Automated document searches in the Information Reference System (IRS) after MRZ reading to reduce manual steps. Flexible video modes: Three options for different examination tasks—real-time viewing without frame skipping, high-resolution capture, and an expanded A4 field-of-view mode. Wider OS compatibility: Now supports Rocky and Debian Linux distributions, expanding deployment options.
Monte Carlo’s low-code observability solution lets users apply custom prompts and AI-powered checks to unstructured fields, to monitor for the quality metrics that are relevant to their unique use case
Monte Carlo has launched unstructured data monitoring, a new capability that enables organizations to ensure trust in their unstructured data assets across documents, chat logs, images, and more, all without needing to write a single line of SQL. With its latest release, Monte Carlo becomes the first data + AI observability platform to provide AI-powered support for monitoring both structured and unstructured data types. Monte Carlo users can now apply customizable, AI-powered checks to unstructured fields, allowing users to monitor for the quality metrics that are relevant to their unique use case. Monte Carlo goes beyond the standard quality metrics and allows customers to use custom prompts and classifications so as to make monitoring truly meaningful. Monte Carlo continues its strategic partnership with Snowflake, the AI Data Cloud company, to support Snowflake Cortex Agents, Snowflake’s AI-powered agents that orchestrate across structured and unstructured data to provide more reliable AI-driven decisions. In addition, Monte Carlo is extending its partnership with Databricks to include observability for Databricks AI/BI – a compound AI system built into Databricks’ platform that generates rich insights from across the data + AI lifecycle – including ETL pipelines, lineage, and other queries. By supporting Snowflake Cortex Agents and Databricks AI/BI, Monte Carlo helps data teams ensure their foundational data is reliable and trustworthy enough to support real-time business insights driven by AI.
Snorkel AI’s platform offers programmatic tooling to create AI-ready data for building fine-grained, domain-specific evaluation of models against the generic off-the-shelf “LLM-as-a-judge” approach
Snorkel AI has announced general availability of two new product offerings on the Snorkel AI Data Development Platform: 1) Snorkel Evaluate enables users to build specialized, fine-grained evaluation of models and agents. Powered by Snorkel AI’s unique programmatic approach to curating AI ready data, this new offering allows enterprises to scale their evaluation workflows to confidently deploy AI systems to production. Snorkel Evaluate includes programmatic tooling for benchmark dataset creation, the development of specialized evaluators, and error mode correction. These tools help users go beyond generic datasets and off-the-shelf “LLM-as-a-judge” approaches to efficiently build actionable, domain-specific evaluations. 2) Snorkel Expert Data-as-a-Service is a white-glove solution to deliver expert datasets for frontier AI system evaluation and tuning to enterprises. Leading LLM developers are already partnering with Snorkel AI to create datasets for advanced reasoning, agentic tool use, multi-turn user interaction, and domain-specific knowledge. The offering combies Snorkel’s network of highly trained subject matter experts with its unique programmatic technology platform for data labeling and quality control, enabling efficient delivery of specialized datasets. Snorkel Expert Data-as-a-Service equips enterprises to mix in-house expertise and data with proprietary datasets developed using outsourced expertise.
Snowflake’s acquisition of Crunchy Data to enable it to offer enterprise-grade, fully managed and automated PostgreSQL for powering agentic AI at scale
Snowflake Inc. said that it’s buying a database startup called Crunchy Data Solutions Inc. in a $250 million deal that’s expected to close imminently, bolstering its agentic AI capabilities. The startup has developed a cloud-based database platform that makes it simple for businesses and government agencies to use PostgreSQL without having to manage the underlying infrastructure. Executive Vice President of Product Christian Kleinerman and Crunchy Data founder and Chief Executive Paul Laurence explained that the upcoming Snowflake Postgres platform will “simplify how developers build, deploy and scale agents and apps.” They were referring to AI agents, which are widely expected to become the next big thing after generative AI, taking actions on behalf of humans to automate complex work with minimal human supervision. When it launches as a technology preview in the coming weeks, Snowflake Postgres will be an enterprise-grade PostgreSQL offering that will give developers the full power and flexibility found in the original, open-source Postgres database, together with the superior operational standards, governance, security and flexibility of Snowflake’s cloud data warehouse. According to Snowflake, it will help developers to speed up the development of new AI agents and simplify the way they access data. “Access to a PostgreSQL database directly within Snowflake has the potential to be incredibly impactful for our team and our customers, as it would allow us to securely deploy our Snowflake Native App, LandingLens, into our customers’ account,” said Dan Maloney, CEO of Snowflake customer LandingAI Inc. “This integration is a key building block in making it simpler to build, deploy and run AI applications directly on the Snowflake platform.” The advantage of having a PostgreSQL offering is that it is flexible enough to be the underlying database for AI agents that leverage data from their respective cloud platforms.