• Menu
  • Skip to right header navigation
  • Skip to main content
  • Skip to primary sidebar

DigiBanker

Bringing you cutting-edge new technologies and disruptive financial innovations.

  • Home
  • Pricing
  • Features
    • Overview Of Features
    • Search
    • Favorites
  • Share!
  • Log In
  • Home
  • Pricing
  • Features
    • Overview Of Features
    • Search
    • Favorites
  • Share!
  • Log In

Google’s EmbeddingGemma small model optimizes 308M-parameter multilingual embeddings for phones and laptops; enabling offline, private semantic search and retrieval in enterprise apps

September 10, 2025 //  by Finnovate

Google’s open-source Gemma is already a small model designed to run on devices like smartphones. However, Google continues to expand the Gemma family of models and optimize these for local usage on phones and laptops. Its newest model, EmbeddingGemma, will take on embedding models already used by enterprises, touting a larger parameter than most and strong benchmark performance. EmbeddingGemma is a 300 million token parameter, open-source model that is best optimized for devices like laptops, desktops and mobile devices. Min Choi, product manager, and Sahil Dua, lead research engineer at Google DeepMind, wrote in a blog post that EmbeddingGemma “offers customizable output dimensions” and will work with its open-source Gemma 3n model. “Designed specifically for on-device AI, its highly efficient 308 million parameter design enables you to build applications using techniques such as RAG and semantic search that run directly on your hardware,” Choi and Dua said. “It delivers private, high-quality embeddings that work anywhere, even without an internet connection.” The model performed well on the Massive Text Embedding Benchmark (MTEB) multilingual v2, which measures the capabilities of embedding models. It is the highest-ranked model under 500M parameters. A significant use case for EmbeddingGemma involves developing mobile RAG pipelines and implementing semantic search. RAG relies on embedding models, which create numerical representations of data that models or agents can reference to answer queries. Building a mobile RAG pipeline enables information gathering and answering queries more directly on local devices. Employees can ask their questions or direct agents through their phones or other devices to find the information they need.  Choi and Dua said that EmbeddingGemma works to create high-quality embeddings. To do this, EmbeddingGemma introduced a method called Matryoshka Representation Learning. This gives the model flexibility, as it can provide multiple embedding sizes within a single model.

Read Article

Category: Essential Guidance

Previous Post: « Embedded payments are seeing rising adoption in the parking sector through AI-recognition tech that lets customers just drive in and scan a QR code to enter their credit card information the first time they park, with automatic vehicle identification and charges applied on subsequent trips

Copyright © 2025 Finnovate Research · All Rights Reserved · Privacy Policy
Finnovate Research · Knyvett House · Watermans Business Park · The Causeway Staines · TW18 3BA · United Kingdom · About · Contact Us · Tel: +44-20-3070-0188

We use cookies to provide the best website experience for you. If you continue to use this site we will assume that you are happy with it.