A new TELUS Digital survey of 1,000 U.S. adults found that 87% respondents (up from 75% in 2023) believe companies should be transparent about how they source data for GenAI models. Additionally, 65% believe that the exclusion of high-quality, verified content, such as information from trusted media sources (e.g. New York Times, Reuters, Bloomberg), can lead to inaccurate and/or biased large language model (LLM) responses. “As AI systems become more specialized and embedded in high-stakes use cases, the quality of the datasets used to optimize outputs is emerging as a key differentiator for enterprises between average performance and having the potential to drive real-world impacts,” said Amith Nair, Global VP and General Manager, Data & AI Solutions, TELUS Digital. “We’re well past the era where general crowdsourced or internet data can meet today’s enterprises’ more complex and specialized use cases. This is reflected in the shift in our clients’ requests from ‘wisdom of the crowd’ datasets to ‘wisdom of the experts’. Experts and industry professionals are helping curate such datasets to ensure they are technically sound, contextually relevant and responsibly built. In high-stakes domains like healthcare or finance, even a single mislabelled data point can distort model behavior in ways that are difficult to detect and costly to correct.“ In response to evolving industry dynamics, TELUS Digital Experience has launched 13 off-the-shelf STEM (science, technology, engineering and mathematics) datasets, including coding and reasoning data that is critical for LLM advancements. The datasets have been expertly-curated by a diverse pool of contributors, including Ph.D. researchers, professors, graduate students and working professionals from around the world. This gives enterprises access to high-quality data that has been cleaned, labeled and formatted for immediate integration into AI training workflows.