• Menu
  • Skip to right header navigation
  • Skip to main content
  • Skip to primary sidebar

DigiBanker

Bringing you cutting-edge new technologies and disruptive financial innovations.

  • Home
  • Pricing
  • Features
    • Overview Of Features
    • Search
    • Favorites
  • Share!
  • Log In
  • Home
  • Pricing
  • Features
    • Overview Of Features
    • Search
    • Favorites
  • Share!
  • Log In

IBM’s open-sourced language model series introduces hybrid Mamba-2 architecture with mixture-of-experts design; cutting RAM requirements from 90GB to 15GB for comparable model performance.

October 7, 2025 //  by Finnovate

IBM  open-sourced Granite 4, a language model series that combines elements of two different neural network architectures. The algorithm family includes four models on launch. They range in size from 3 billion to 32 billion parameters. IBM claims they can outperform comparably-sized models using less memory. Granite-4.0-Micro, one of the smallest algorithms in the lineup, is based on the Transformer architecture that powers most large language models. The architecture’s flagship feature is its so-called attention mechanism. The mechanism enables an LLM to review a snippet of text, identify the most important sentences and prioritize them during the decision-making process. The three other Granite 4 models combine an attention mechanism with processing components based on the Mamba neural network architecture, a Transformer alternative. The technology’s main selling point is that it’s more hardware-efficient. Mamba models require a fraction of the memory, which reduces inference costs. The Granite 4 series compresses one of the technology’s core components into about 25 lines of code. That enables Mamba 2 to perform some tasks using less hardware than the original version of the architecture. The most advanced Granite 4 model, Granite-4.0-H-Small, includes 32 billion parameters. It has a mixture-of-experts design that activates 9 billion parameters to answer prompts. IBM envisions developers using the model for tasks such as processing customer support requests. The two other Mamba-Transformer algorithms in the series, Granite-4.0-H-Tiny and Granite-4.0-H-Micro, feature 7 billion and 3 billion parameters, respectively. They’re designed for latency-sensitive use cases that prioritize speed over processing accuracy.

Read Article

Category: Additional Reading

Previous Post: « Shanghai researchers prove agentic AI emerges from quality over quantity; they trained superior autonomous systems with 78 examples versus thousands used by conventional approaches
Next Post: Photonic fabrics replace copper interconnects in AI data centers; reducing energy consumption fourfold and enabling efficient GPU clustering for massive AI model processing »

Copyright © 2025 Finnovate Research · All Rights Reserved · Privacy Policy
Finnovate Research · Knyvett House · Watermans Business Park · The Causeway Staines · TW18 3BA · United Kingdom · About · Contact Us · Tel: +44-20-3070-0188

We use cookies to provide the best website experience for you. If you continue to use this site we will assume that you are happy with it.