Salesforce is betting that rigorous testing in simulated business environments will solve one of enterprise artificial intelligence’s biggest problems: agents that work in demonstrations but fail in the messy reality of corporate operations. The cloud software giant unveiled three major AI research initiatives this week, including CRMArena-Pro, what it calls a “digital twin” of business operations where AI agents can be stress-tested before deployment. The announcement comes as enterprises grapple with widespread AI pilot failures and fresh security concerns following recent breaches that compromised hundreds of Salesforce customer instances. “Pilots don’t learn to fly in a storm; they train in flight simulators that push them to prepare in the most extreme challenges,” said Silvio Savarese, Salesforce’s chief scientist and head of AI research, during a press conference. “Similarly, AI agents benefit from simulation testing and training, preparing them to handle the unpredictability of daily business scenarios in advance of their deployment.” The research push reflects growing enterprise frustration with AI implementations. A recent MIT report found that 95% of generative AI pilots at companies are failing to reach production, while Salesforce’s own studies show that large language models alone achieve only 35% success rates in complex business scenarios.
Google’s Gemini and xAI’s Grok are narrowing ChatGPT’s lead, with Gemini No. 2 across platforms and Grok jumping ~40% MAUs in July to exceed 20 million
ChatGPT rivals like Google’s Gemini, xAI’s Grok, and, to a lesser extent, Meta AI, are closing the gap to ChatGPT, OpenAI’s popular AI chatbot, according to a new report focused on the consumer AI landscape from venture firm Andreessen Horowitz. For the fifth time, 14 companies appeared on the list of top AI products: ChatGPT, Perplexity, Poe, Character AI, Midjourney, Leonardo, Veed, Cutout, ElevenLabs, Photoroom, Gamma, QuillBot, Civitai, and Hugging Face. Five other companies appeared on all but the first report, the firm notes, including Claude, DeepAI, Janitor AI, Pixelcut, and Suno, representing general AI use, companionship, image editing, and music generation. For the first time, Google gained four spots on the list of the top generative AI consumer web products with entries for Gemini, AI Studio, NotebookLM, and Google Labs. Of note, No. 2 app Gemini is closing the gap to No. 1 app ChatGPT on mobile devices, but with almost half as many monthly active users. Not surprisingly, Gemini’s AI technology sees stronger adoption on Android, with nearly 90% of the monthly active user base. On the web, Gemini also came in second place behind ChatGPT, with approximately 12% of ChatGPT’s visits. The company’s AI Studio, a developer-oriented sandbox for building with Gemini models, entered the top 10 list of AI web products, sitting in the 10th spot; NotebookLM was No. 13. Google Labs, a destination for Google’s AI experiments (e.g., Flow, Project Mariner, and Doppl), ranked at No. 39. Grok ranked fourth on the web and No. 23 on mobile. This is quick growth, given that Grok went from having no stand-alone app at the end of 2024 (it was first launched on X) to now, with upward of 20 million monthly active users. In July 2025, Grok also climbed nearly 40% when Grok 4 was released. Meta’s general assistant ranked No. 46 on the web — the same as in March — but it didn’t make the list of top mobile AI apps. DeepSeek and Claude also saw their growth flatten on mobile, with the former falling off its peak by 22%. On the web, DeepSeek saw an even sharper drop-off, down more than 40% from its peak in February 2025. Perplexity and Claude, however, continued to grow. Vibe-coding startups Lovable and Replit both debuted on the main list this time, having not made the cut on a16z’s list published back in March of this year.
Hyland debuts Enterprise Context Engine that unifies all company info (ERP, HR, CRM) creating a graph‑driven “living record” to feeds and informs AI workflows while preserving institutional knowledge
Enterprise content management firm Hyland Software Inc. launched two new components of its Content Innovation Cloud, which the company says present a unified, continuously updated view of an organization’s content, processes, people and applications to fuel a network of task-specific artificial intelligence agents. The Enterprise Context Engine pulls from systems such as enterprise resource planning, customer relationship management and human resources and maps relationships that create what the company describes as a “living record of enterprise activity.” Enterprise Context Engine as “a shared services platform layer” that sits beneath Hyland products and agentic solutions. It leverages graph analytics technologies to connect artifacts in a way that informs workflows and supports new applications. The Enterprise Agent Mesh is a network of task-specific agents tuned for specific industries, including healthcare, banking, insurance, government and higher education. Hyland said the mesh uses the context layer to make decisions and take actions inside complex workflows, while preserving institutional knowledge and incorporating human feedback. The company will provide prebuilt meshes for its core verticals and a no-code platform customers can use to adapt or assemble their own. The Agent Mesh architecture enables organizations to leverage the Enterprise Context Engine to replace business processes with agent meshes. The platform is designed to work in conjunction with customers’ existing repositories and workflow engines. The architecture uses the Model Context Protocol to connect to systems of record and other vendors’ agents.
xAI releases agentic coding model with 256k context function calling and 160 tokens/second (GPT-5: 50 TPS, Gemini 2.5: 92 TPS) at $0.20 input and $1.50 output per million tokens
Elon Musk’s xAI, released grok-code-fast-1, a dedicated agentic coding artificial intelligence model that is extremely speedy and designed to strike a “compelling balance between performance and cost.” In a market that is quickly becoming cluttered with models offering coding capabilities, xAI said it built the model from the ground up and built it with a brand-new architecture fit for task. Grok-code-fast-1 has mastered the use of common tools like grep, terminal, and file editing, and thus should feel right at home in your favorite IDE. The model supports function calling, structured outputs and reasoning with a 256,000 token context window. This window size enables the model to recall the equivalent of hundreds of pages of text or code simultaneously, allowing it to efficiently review large portions of codebases while working. As for speed, according to xAI’s own benchmarks, the new model can execute at around 160 tokens per second. Compared to other popular models on the market in the same xAI released benchmarks, OpenAI’s GPT-5 averages around 50.1 tp/s, Gemini 2.5 Pro hit around 92.4 tp/s and Claude 4 Sonnet reached 78.7 tp/s. The company said on a full subset of SWE-Bench-Verified, a human-validated evaluation of the AI model’s ability to solve real-world software engineering problems, the model received a 70.8% using an internal system. In comparison, GPT-5 received a 74.9% (with thinking) and Claude Sonnet 4 achieved 72.7%. Given that xAI will be competing against multiple other models on the market that provide coding capabilities, the company said it intends to deliver consistent updates and improvements on the order of days rather than weeks.
Web3 gaming gets a boost as DAR Open Networks rolls out quest system and cross-game rewards to encourage players to try new titles
DAR Open Network is launching the DAR Quest System, a Web3 quest-and-reward framework that will launch on September 1, 2025. The system aims to connect games and players across the Dalarnia Open Network, offering incentives for playing and a chance at monthly token rewards. The system works by allowing players to play games, complete quests, and earn rewards, such as Moon Coins, Quest Points, and exclusive in-game items. Quest Points can be used to enter Play-2-Airdrop competitions, where payouts depend on performance. DAR is putting 100,000 D tokens into the prize pool for the inaugural four-week season starting September 1. The system aims to encourage players to try new titles across the Dalarnia Open Network, rather than just sticking to one game. Seasonal quests will also be available specifically to DAR Citizenship holders, adding another reason for committed players to stay. DAR frames the Quest System as part of an AI-powered, chain-agnostic infrastructure that allows assets and incentives to move across games. The upside for game studios is exposure, while players benefit from a structured path to earning Moon Coins.
Microsoft’s new open-source AI model create podcasts and other audio generates four distinct voices for up to 90 minutes; includes research only licensing and watermarking safeguards
Microsoft has released VibeVoice, a new open-source AI model that lets users create podcasts and other audio — a counter to Google’s popular NotebookLM. Microsoft’s text-to-speech model can generate four voices and up to 90 minutes of podcast-quality speech. NotebookLM can do two voices. Additionally, VibeVoice reads and organizes text while NotebookLM ingests documents and turns them into two-person podcasts. Users can also query and get document summaries, according to tech firm Hugging Face. That means VibeVoice doesn’t try to understand the text but rather performs it audibly, ostensibly to replace a recording studio. VibeVoice runs on 1.5 billion parameters, relatively small for a model capable of sustaining dialogue across multiple speakers. It was trained using Alibaba’s open-source Qwen2.5, a large language model that helps orchestrate natural turn-taking and contextually aware speech patterns during dialogues. Microsoft claims this means VibeVoice can produce fluid conversations among four voices and yet maintain each voice’s distinct characteristics, even in longer conversations. Potential research applications of VibeVoice include the following: Prototyping podcasts and training content: Creators could generate mock podcasts, panel discussions or training modules with multiple AI voices. Instead of hiring four voice actors to test dialogue flow, users can create a synthetic version in minutes using text. Accessibility and education: Educational material, textbooks or research papers could be turned into long-form audio with distinct narrators. This could help people who learn better by listening, or make dense material more engaging. Game and media development: Game developers or storytellers could use VibeVoice to prototype dialogue between characters. Because it handles four speakers, you can stage a full in-game conversation without recording sessions.
Gaia unveils on‑device “AI sovereignty” phone built on Samsung Galaxy, running inference locally as a full network node with token rewards; South Korea and Hong Kong first
Gaia has launched Gaia AI Phone, the world’s first smartphone designed for complete AI sovereignty. Built on Galaxy S25 Edge hardware and launching initially in Korea and Hong Kong, the device runs sophisticated AI inference—the process of applying trained AI models to generate responses and perform tasks—locally through the company’s proprietary Gaia AI Platform. Each phone operates as a full Gaia network node, contributing computational resources to the decentralized AI infrastructure while earning token rewards for providing inference capacity to other network participants. The Gaia AI Phone launches with an ecosystem of exclusive partner integrations, including EdenLayer’s AI gaming platform, Roam’s global eSIM data package and Umy’s crypto-native travel booking discounts. Phone owners also receive token airdrops from partner projects, representing immediate participation in the decentralized AI economy. The underlying technology represents a significant engineering achievement, compressing language model capabilities that previously required data center infrastructure to run efficiently on smartphone hardware with full network participation capabilities. The device addresses growing concerns about AI centralization and data sovereignty in the blockchain community. While traditional AI systems extract value from user data for centralized entities, Gaia’s model distributes both processing power and economic benefits among network participants, creating a truly decentralized alternative to centralized AI monopolies. The Gaia AI Phone represents a controlled technology demonstration designed to validate decentralized AI architecture on consumer hardware.
Warp offers tighter oversight of command‑line coding agents workflows with a reviewable feedback loop; its new diff‑tracking UI shows every agent change in real time, supports line‑level prompts and manual edits
The AI coding tool Warp has a plan for making coding agents more comprehensible — and it looks an awful lot like pair programming. The company is releasing Warp Code, a new set of features designed to give users more oversight over command-line-based coding agents, with more extensive difference tracking and a clearer view of what the coding agent is doing. With the new features, founder Zach Lloyd wants to “make a much tighter feedback loop for this agentic style of coding.” In practical terms, that means you can see exactly what the agent is doing and ask questions along the way. “As the agent is writing code, you’ll be able to see every little diff that the agent is making,” Lloyd says, “and you’ll have an easy way of commenting on those diffs and adjusting the agent as it goes along.” The general interface will be familiar to Warp users: a space at the bottom for giving direct instructions to the agent, along with a window for seeing the agent’s responses and a side window where you can see the changes the agent makes step by step. You can change the code by hand if you want to, similar to code-based tools like Cursor, but you can also highlight specific lines to add as context for a request or a question. Perhaps most impressive, Warp’s compiler will automatically troubleshoot any errors that come up when the code compiles.
Apple Intelligence plans a planner‑search‑summarizer Siri, keeping personal context on‑device while calling Google Gemini via Private Cloud Compute for web answers
A new rumor suggests Apple will introduce a web search feature backed by Apple Foundation Models that can call out to Google Gemini to enhance Siri’s ability to gather and summarize information. The new Siri will have three core components: a planner, a search operator, and a summarizer. Apple’s Foundation Model will act as the planner and search since that’s dealing with on-device personal data, but getting the data from the web and collating it may be up to the Google model. There is still a lot to learn about Apple’s approach to AI going forward, as it had to scrap its previous approach entirely. The new Siri powered by Apple Intelligence LLMs is expected to launch in early 2026 with iOS 26.4. That isn’t to say third parties won’t be involved, especially since Apple’s search deal with Google will continue. Google is the default search engine in Safari, and that can be changed by the user, but it’s also the search engine used by Siri. Some queries rely on something called Siri intelligence, which is an old term that predates AI and refers to algorithms derived from device and web data. It seems on-device Apple Foundation Models will be responsible for parsing app intent systems and personal data. These on-device systems will power contextual actions, system-wide suggestions, and more. However, when Siri and its new LLM backend detect that a user query will require additional resources, it’ll call out to an AI agent equipped to deal with the topic. Initially, that system will be Google Gemini via a model running in Private Cloud Compute servers controlled by Apple. The so-called “world knowledge” will come from it and be summarized and presented to the user. This system is different from how it works with Apple’s ChatGPT partnership. When Apple Intelligence passes a query to ChatGPT, it is run on OpenAI servers, though there is a contract in place that forces OpenAI to discard queries and data.
Stardog’s “hallucination‑free” AI data assistant ties multi‑agent parsing and validation to a unified knowledge graph database; extending retrieval beyond documents to databases for safe, compliant insights
Knowledge graph database startup Stardog Union is launching a new, “hallucination-free” version of its enterprise-grade chatbot Voicebox, aimed at high-stakes industries. Voicebox can be thought of as an “enterprise answer engine” that’s linked to an organization’s internal data, allowing it to respond to knowledge worker’s questions with extreme accuracy in real time. Stardog said it’s targeted at organizations in the most heavily regulated industries, such as financial services, healthcare and defense, enabling them to ask complex questions and receive answers that are based on their own data and fully traceable. The chatbot leverages Stardog’s pioneering knowledge graph platform, which is a flexible and reusable data layer that can access information from across various disparate systems, including siloed databases and applications. It unites data from across the entire organization to power data analytics and other big data initiatives. Stardog says Voicebox is essentially an additional AI layer built atop of its knowledge graph, powered by multiple agents that collaborate behind the scenes on tasks such as data discovery, integration, modeling and mapping. The idea is to provide knowledge workers with a user-controlled, self-service analytics experience that’s completely accessible via natural language commands. It’s supposed to enable everyone within an organization to carry out their own analytics operations and dig up better business insights, the company said. The company’s knowledge graph utilizes what Stardog says is a “Safety RAG architecture,” which it claims is safer than traditional retrieval-augmented generation because it’s designed to ensure no hallucinations can slip through the cracks. Stardog says Safety RAG is also more expressive, because its knowledge graph allows it to build a more complete data environment, expanding its reach from unstructured data only to include traditional databases and other structured data types. According to Stardog, Voicebox will never generate false responses. Instead, it will simply admit when it doesn’t know how to answer a specific question. When this happens, it will ask users to provide examples of “competency questions” so it can direct its agents to perform the necessary integrations, modeling and mapping to try and answer the original question.
