• Menu
  • Skip to right header navigation
  • Skip to main content
  • Skip to primary sidebar

DigiBanker

Bringing you cutting-edge new technologies and disruptive financial innovations.

  • Home
  • Pricing
  • Features
    • Overview Of Features
    • Search
    • Favorites
  • Share!
  • Log In
  • Home
  • Pricing
  • Features
    • Overview Of Features
    • Search
    • Favorites
  • Share!
  • Log In

Mistral’s AI voice model can listen to and transcribe up to 30 minutes of audio or 40 minutes of audio understanding without switching to a separate mode and offer summarization at less than half the price of comparable APIs

July 18, 2025 //  by Finnovate

Mistral released an open-sourced voice model that could rival paid voice AI, such as those from ElevenLabs and Hume AI, which the company said bridges the gap between proprietary speech recognition models and the more open, yet error-prone versions.  The company said Voxtral “offers state-of-the-art accuracy and native semantic understanding in the open, at less than half the price of comparable APIs.” Voxtral, at a 32K token context, can listen to and transcribe up to 30 minutes of audio or 40 minutes of audio understanding. It offers summarization, meaning the model can answer questions based on the audio content and generate summaries without switching to a separate mode. Users can trigger functions and API calls based on spoken instructions. The model is based on Mistral’s Mistral Small 3.1 and supports multiple languages and can automatically detect languages. Mistral added enterprise features to Voxtral, including private deployment, so that organizations can integrate the model into their own ecosystems. These features also include domain-specific fine-tuning and advanced context and priority access to engineering resources for customers who need help integrating Voxtral into their workflows. Mistral stated that Voxtral outperformed existing voice models, including OpenAI’s Whisper, Gemini 2.5 Flash and Scribe from ElevenLabs. Voxtral presented fewer word errors compared to Whisper, which is currently considered the best automatic speech recognition model available.

Read Article

Category: Channels, Innovation Topics

Previous Post: « Embedded payments are seeing rising adoption in the parking sector through AI-recognition tech that lets customers just drive in and scan a QR code to enter their credit card information the first time they park, with automatic vehicle identification and charges applied on subsequent trips

Copyright © 2025 Finnovate Research · All Rights Reserved · Privacy Policy
Finnovate Research · Knyvett House · Watermans Business Park · The Causeway Staines · TW18 3BA · United Kingdom · About · Contact Us · Tel: +44-20-3070-0188

We use cookies to provide the best website experience for you. If you continue to use this site we will assume that you are happy with it.