Google's EmbeddingGemma small model optimizes 308M-parameter multilingual embeddings for phones and laptops; enabling offline, private semantic search and retrieval in enterprise apps • DigiBanker

Google’s open-source Gemma is already a small model designed to run on devices like smartphones. However, Google continues to expand the Gemma family of models and optimize these for local usage on phones and laptops. Its newest model, EmbeddingGemma, will take on embedding models already used by enterprises, touting a larger parameter than most and strong benchmark performance. EmbeddingGemma is a 300 million token parameter, open-source model that is best optimized for devices like laptops, desktops and mobile devices. Min Choi, product manager, and Sahil Dua, lead research engineer at Google DeepMind, wrote in a blog post that EmbeddingGemma “offers customizable output dimensions” and will work with its open-source Gemma 3n model. “Designed specifically for on-device AI, its highly efficient 308 million parameter design enables you to build applications using techniques such as RAG and semantic search that run directly on your hardware,” Choi and Dua said. “It delivers private, high-quality embeddings that work anywhere, even without an internet connection.” The model performed well on the Massive Text Embedding Benchmark (MTEB) multilingual v2, which measures the capabilities of embedding models. It is the highest-ranked model under 500M parameters. A significant use case for EmbeddingGemma involves developing mobile RAG pipelines and implementing semantic search. RAG relies on embedding models, which create numerical representations of data that models or agents can reference to answer queries. Building a mobile RAG pipeline enables information gathering and answering queries more directly on local devices. Employees can ask their questions or direct agents through their phones or other devices to find the information they need. Choi and Dua said that EmbeddingGemma works to create high-quality embeddings. To do this, EmbeddingGemma introduced a method called Matryoshka Representation Learning. This gives the model flexibility, as it can provide multiple embedding sizes within a single model.

Read Article