• Menu
  • Skip to right header navigation
  • Skip to main content
  • Skip to primary sidebar

DigiBanker

Bringing you cutting-edge new technologies and disruptive financial innovations.

  • Home
  • Pricing
  • Features
    • Overview Of Features
    • Search
    • Favorites
  • Share!
  • Log In
  • Home
  • Pricing
  • Features
    • Overview Of Features
    • Search
    • Favorites
  • Share!
  • Log In

Google’s Gemini 2.5 Computer Use model enables AI agents to interact with browser user interfaces through 13 supported UI actions including clicking, typing, scrolling, cursor hovering, etc

October 10, 2025 //  by Finnovate

Google LLC has just announced a new version of its Gemini large language model that can navigate the web through a browser and interact with various websites, meaning it can perform tasks such as searching for information or buying things without human supervision. The model, Gemini 2.5 Computer Use, uses a combination of visual understanding and reasoning to analyze user’s requests and carry out tasks in the browser. It will complete all of the actions required to fulfill that task, such as clicking, typing, scrolling, manipulating dropdown menus and filling out and submitting forms, just as a human can do. Google’s DeepMind research outfit said Gemini 2.5 Computer Use is based on the Gemini 2.5 Pro LLM. It explained that earlier versions of the model have been used to power earlier agentic features it has launched in tools such as AI Mode and Project Mariner. But this is the first time the complete model has been made available. The company explained that each request kicks off a “loop” that involves the model go through various steps until it’s considered complete. First, the user sends a request to the model, which can also include screenshots of the website in question and a history of recent actions. Then, Gemini 2.5 Computer Use will analyze those inputs and generate a response, which will typically be a “function call representing one of the UI actions such as clicking or typing.” Client-side code will then execute the required action, and after this is done, a new screenshot of the graphical user interface and the current website will be sent back to the model as a function response.

Read Article

Category: Essential Guidance

Previous Post: « Socure launches no-code Hosted Flows eliminates the engineering and UX effort required to design, test, and maintain verification experiences
Next Post: Checkout.com launches Flow Remember Me, a one-click solution that allows shoppers to save their card details once and then have them immediately available for use across Checkout.com »

Copyright © 2025 Finnovate Research · All Rights Reserved · Privacy Policy
Finnovate Research · Knyvett House · Watermans Business Park · The Causeway Staines · TW18 3BA · United Kingdom · About · Contact Us · Tel: +44-20-3070-0188

We use cookies to provide the best website experience for you. If you continue to use this site we will assume that you are happy with it.