• Menu
  • Skip to right header navigation
  • Skip to main content
  • Skip to primary sidebar

DigiBanker

Bringing you cutting-edge new technologies and disruptive financial innovations.

  • Home
  • Pricing
  • Features
    • Overview Of Features
    • Search
    • Favorites
  • Share!
  • Log In
  • Home
  • Pricing
  • Features
    • Overview Of Features
    • Search
    • Favorites
  • Share!
  • Log In

Meta’s new Llama API to use Cerebras ultra-fast inference tech that would allow developers build apps that require chaining multiple LLM calls while offering generation speeds up to 18X faster than traditional GPU-based solutions

May 2, 2025 //  by Finnovate

Meta announced a partnership with Cerebras Systems to power its new Llama API, offering developers access to inference speeds up to 18 times faster than traditional GPU-based solutions. What sets Meta’s offering apart is the dramatic speed increase provided by Cerebras’ specialized AI chips. The Cerebras system delivers over 2,600 tokens per second for Llama 4 Scout, compared to approximately 130 tokens per second for ChatGPT and around 25 tokens per second for DeepSeek, according to benchmarks from Artificial Analysis. This speed advantage enables entirely new categories of applications that were previously impractical, including real-time agents, conversational low-latency voice systems, interactive code generation, and instant multi-step reasoning — all of which require chaining multiple large language model calls that can now be completed in seconds rather than minutes. The Llama API represents a significant shift in Meta’s AI strategy, transitioning from primarily being a model provider to becoming a full-service AI infrastructure company. By offering an API service, Meta is creating a revenue stream from its AI investments while maintaining its commitment to open models. The API will offer tools for fine-tuning and evaluation, starting with Llama 3.3 8B model, allowing developers to generate data, train on it, and test the quality of their custom models. Meta emphasizes that it won’t use customer data to train its own models, and models built using the Llama API can be transferred to other hosts—a clear differentiation from some competitors’ more closed approaches. Cerebras will power Meta’s new service through its network of data centers located throughout North America, including facilities in Dallas, Oklahoma, Minnesota, Montreal, and California. By combining the popularity of its open-source models with dramatically faster inference capabilities, Meta is positioning itself as a formidable competitor in the commercial AI space. For Cerebras, this partnership represents a major milestone and validation of its specialized AI hardware approach.

Read Article

Category: AI & Machine Economy, Innovation Topics

Previous Post: « Anthropic’s new feature update to enable Claude to incorporate data from SaaS applications into its prompt responses while its Research tool to allow preparing detailed reports about user-specified topics with more thorough analysis
Next Post: Community banks and credit unions can enable extensibility through an internally built, custom middleware system, or by using external vendors with capabilities that stand on top of existing core systems »

Copyright © 2025 Finnovate Research · All Rights Reserved · Privacy Policy
Finnovate Research · Knyvett House · Watermans Business Park · The Causeway Staines · TW18 3BA · United Kingdom · About · Contact Us · Tel: +44-20-3070-0188

We use cookies to provide the best website experience for you. If you continue to use this site we will assume that you are happy with it.