IBM’s open-sourced language model series introduces hybrid Mamba-2 architecture with mixture-of-experts design; cutting RAM requirements from 90GB to 15GB for comparable model performance. • DigiBanker

IBM open-sourced Granite 4, a language model series that combines elements of two different neural network architectures. The algorithm family includes four models on launch. They range in size from 3 billion to 32 billion parameters. IBM claims they can outperform comparably-sized models using less memory. Granite-4.0-Micro, one of the smallest algorithms in the lineup, is based on the Transformer architecture that powers most large language models. The architecture’s flagship feature is its so-called attention mechanism. The mechanism enables an LLM to review a snippet of text, identify the most important sentences and prioritize them during the decision-making process. The three other Granite 4 models combine an attention mechanism with processing components based on the Mamba neural network architecture, a Transformer alternative. The technology’s main selling point is that it’s more hardware-efficient. Mamba models require a fraction of the memory, which reduces inference costs. The Granite 4 series compresses one of the technology’s core components into about 25 lines of code. That enables Mamba 2 to perform some tasks using less hardware than the original version of the architecture. The most advanced Granite 4 model, Granite-4.0-H-Small, includes 32 billion parameters. It has a mixture-of-experts design that activates 9 billion parameters to answer prompts. IBM envisions developers using the model for tasks such as processing customer support requests. The two other Mamba-Transformer algorithms in the series, Granite-4.0-H-Tiny and Granite-4.0-H-Micro, feature 7 billion and 3 billion parameters, respectively. They’re designed for latency-sensitive use cases that prioritize speed over processing accuracy.

Read Article