Gemini’s new foundation model runs locally on bi-arm robotic devices, without accessing a data network and enables rapid experimentation with dexterous manipulation and adaptability to new tasks through fine-tuning • DigiBanker

Google DeepMind introduced a vision language action (VLA) model that runs locally on robotic devices, without accessing a data network. The new Gemini Robotics On-Device robotics foundation model features general-purpose dexterity and fast task adaptation. “Since the model operates independent of a data network, it’s helpful for latency sensitive applications and ensures robustness in environments with intermittent or zero connectivity,” Google DeepMind Senior Director and Head of Robotics Carolina Parada said. Building on the task generalization and dexterity capabilities of Gemini Robotics, which was introduced in March, Gemini Robotics On-Device is meant for bi-arm robots and is designed to enable rapid experimentation with dexterous manipulation and adaptability to new tasks through fine-tuning. The model follows natural language instructions and is dexterous enough to perform tasks like unzipping bags, folding clothes, zipping a lunchbox, drawing a card, pouring salad dressing and assembling products. It is also Google DeepMind’s first VLA model that is available for fine-tuning. “While many tasks will work out of the box, developers can also choose to adapt the model to achieve better performance for their applications,” Parada said in the post. “Our model quickly adapts to new tasks, with as few as 50 to 100 demonstrations — indicating how well this on-device model can generalize its foundational knowledge to new tasks.”

Read Article