• Menu
  • Skip to right header navigation
  • Skip to main content
  • Skip to primary sidebar

DigiBanker

Bringing you cutting-edge new technologies and disruptive financial innovations.

  • Home
  • Pricing
  • Features
    • Overview Of Features
    • Search
    • Favorites
  • Share!
  • Log In
  • Home
  • Pricing
  • Features
    • Overview Of Features
    • Search
    • Favorites
  • Share!
  • Log In

Mistral’s AI voice model can listen to and transcribe up to 30 minutes of audio or 40 minutes of audio understanding without switching to a separate mode and offer summarization at less than half the price of comparable APIs

July 18, 2025 //  by Finnovate

Mistral released an open-sourced voice model that could rival paid voice AI, such as those from ElevenLabs and Hume AI, which the company said bridges the gap between proprietary speech recognition models and the more open, yet error-prone versions.  The company said Voxtral “offers state-of-the-art accuracy and native semantic understanding in the open, at less than half the price of comparable APIs.” Voxtral, at a 32K token context, can listen to and transcribe up to 30 minutes of audio or 40 minutes of audio understanding. It offers summarization, meaning the model can answer questions based on the audio content and generate summaries without switching to a separate mode. Users can trigger functions and API calls based on spoken instructions. The model is based on Mistral’s Mistral Small 3.1 and supports multiple languages and can automatically detect languages. Mistral added enterprise features to Voxtral, including private deployment, so that organizations can integrate the model into their own ecosystems. These features also include domain-specific fine-tuning and advanced context and priority access to engineering resources for customers who need help integrating Voxtral into their workflows. Mistral stated that Voxtral outperformed existing voice models, including OpenAI’s Whisper, Gemini 2.5 Flash and Scribe from ElevenLabs. Voxtral presented fewer word errors compared to Whisper, which is currently considered the best automatic speech recognition model available.

Read Article

Category: Channels, Innovation Topics

Previous Post: « Vidnoz AI unveils product avatar and interactive avatar, redefining marketing video- Product Avatar- features realistic hand positioning, shadowing, lighting, and advanced lip-syncing; . Interactive Avatar enhances real-time engagement for customer-facing industries with natural speech interaction, multilingual communication
Next Post: New partnerships drive24% year-over-year growth in Wise’s cross-border volume to $55.2 billion and 17% rise in in active customers to 9.8 million »

Copyright © 2025 Finnovate Research · All Rights Reserved · Privacy Policy
Finnovate Research · Knyvett House · Watermans Business Park · The Causeway Staines · TW18 3BA · United Kingdom · About · Contact Us · Tel: +44-20-3070-0188

We use cookies to provide the best website experience for you. If you continue to use this site we will assume that you are happy with it.