• Menu
  • Skip to right header navigation
  • Skip to main content
  • Skip to primary sidebar

DigiBanker

Bringing you cutting-edge new technologies and disruptive financial innovations.

  • Home
  • Pricing
  • Features
    • Overview Of Features
    • Search
    • Favorites
  • Share!
  • Log In
  • Home
  • Pricing
  • Features
    • Overview Of Features
    • Search
    • Favorites
  • Share!
  • Log In

Nvidia has launched Parakeet-TDT-0.6B-v2, an automatic speech recognition (ASR) model that can transcribe 60 minutes of audio in 1 second with an average “Word Error Rate” of just 6.05%

May 7, 2025 //  by Finnovate

Nvidia has launched Parakeet-TDT-0.6B-v2, an automatic speech recognition (ASR) model that can, “transcribe 60 minutes of audio in 1 second [mind blown emoji].” This version two is so powerful, it currently tops the Hugging Face Open ASR Leaderboard with an average “Word Error Rate” (times the model incorrectly transcribes a spoken word) of just 6.05% (out of 100). To put that in perspective, it nears proprietary transcription models such as OpenAI’s GPT-4o-transcribe (with a WER of 2.46% in English) and ElevenLabs Scribe (3.3%). The model boasts 600 million parameters and leverages a combination of the FastConformer encoder and TDT decoder architectures. It can transcribe an hour of audio in just one second, provided it’s running on Nvidia’s GPU-accelerated hardware. The performance benchmark is measured at an RTFx (Real-Time Factor) of 3386.02 with a batch size of 128, placing it at the top of current ASR benchmarks maintained by Hugging Face. Parakeet-TDT-0.6B-v2 is aimed at developers, researchers, and industry teams building applications such as transcription services, voice assistants, subtitle generators, and conversational AI platforms. The model supports punctuation, capitalization, and detailed word-level timestamping, offering a full transcription package for a wide range of speech-to-text needs. Developers can deploy the model using Nvidia’s NeMo toolkit. The setup process is compatible with Python and PyTorch, and the model can be used directly or fine-tuned for domain-specific tasks. The open-source license (CC-BY-4.0) also allows for commercial use, making it appealing to startups and enterprises alike. Parakeet-TDT-0.6B-v2 is optimized for Nvidia GPU environments, supporting hardware such as the A100, H100, T4, and V100 boards. While high-end GPUs maximize performance, the model can still be loaded on systems with as little as 2GB of RAM, allowing for broader deployment scenarios.

Read Article

Category: Members, AI & Machine Economy, Innovation Topics

Previous Post: « Specialized blockchains are shaping the future of DeFi attracting robust ecosystems and offering developers more freedom to innovate in areas like algorithmic credit scoring, IP rights management, and tokenized commodities
Next Post: IBM’s hybrid technologies enable businesses to build and deploy AI agents with their own enterprise data- offering Agent Catalog in watsonx Orchestrate to simplify access to 150+ agents »

Copyright © 2025 Finnovate Research · All Rights Reserved · Privacy Policy
Finnovate Research · Knyvett House · Watermans Business Park · The Causeway Staines · TW18 3BA · United Kingdom · About · Contact Us · Tel: +44-20-3070-0188

We use cookies to provide the best website experience for you. If you continue to use this site we will assume that you are happy with it.