• Menu
  • Skip to right header navigation
  • Skip to main content
  • Skip to primary sidebar

DigiBanker

Bringing you cutting-edge new technologies and disruptive financial innovations.

  • Home
  • Pricing
  • Features
    • Overview Of Features
    • Search
    • Favorites
  • Share!
  • Log In
  • Home
  • Pricing
  • Features
    • Overview Of Features
    • Search
    • Favorites
  • Share!
  • Log In

Nvidia has launched Parakeet-TDT-0.6B-v2, an automatic speech recognition (ASR) model that can transcribe 60 minutes of audio in 1 second with an average “Word Error Rate” of just 6.05%

May 7, 2025 //  by Finnovate

Nvidia has launched Parakeet-TDT-0.6B-v2, an automatic speech recognition (ASR) model that can, “transcribe 60 minutes of audio in 1 second [mind blown emoji].” This version two is so powerful, it currently tops the Hugging Face Open ASR Leaderboard with an average “Word Error Rate” (times the model incorrectly transcribes a spoken word) of just 6.05% (out of 100). To put that in perspective, it nears proprietary transcription models such as OpenAI’s GPT-4o-transcribe (with a WER of 2.46% in English) and ElevenLabs Scribe (3.3%). The model boasts 600 million parameters and leverages a combination of the FastConformer encoder and TDT decoder architectures. It can transcribe an hour of audio in just one second, provided it’s running on Nvidia’s GPU-accelerated hardware. The performance benchmark is measured at an RTFx (Real-Time Factor) of 3386.02 with a batch size of 128, placing it at the top of current ASR benchmarks maintained by Hugging Face. Parakeet-TDT-0.6B-v2 is aimed at developers, researchers, and industry teams building applications such as transcription services, voice assistants, subtitle generators, and conversational AI platforms. The model supports punctuation, capitalization, and detailed word-level timestamping, offering a full transcription package for a wide range of speech-to-text needs. Developers can deploy the model using Nvidia’s NeMo toolkit. The setup process is compatible with Python and PyTorch, and the model can be used directly or fine-tuned for domain-specific tasks. The open-source license (CC-BY-4.0) also allows for commercial use, making it appealing to startups and enterprises alike. Parakeet-TDT-0.6B-v2 is optimized for Nvidia GPU environments, supporting hardware such as the A100, H100, T4, and V100 boards. While high-end GPUs maximize performance, the model can still be loaded on systems with as little as 2GB of RAM, allowing for broader deployment scenarios.

Read Article

Category: Essential Guidance

Previous Post: « Success of Pix and UPI is paving way for a three-stage framework for state-led fast payment systems that involves weighting pre-requisites, implementation and scaling and establishing engagement mechanisms and regulatory adjustments

Copyright © 2025 Finnovate Research · All Rights Reserved · Privacy Policy
Finnovate Research · Knyvett House · Watermans Business Park · The Causeway Staines · TW18 3BA · United Kingdom · About · Contact Us · Tel: +44-20-3070-0188

We use cookies to provide the best website experience for you. If you continue to use this site we will assume that you are happy with it.OkayPrivacy policy