Microsoft finds API- based agents are generally more stable, less error-prone vis-à-vis GUI-based agents that require multiple actions to accomplish the same goal • DigiBanker

Microsoft researchers have compared API-based and GUI-based AI agents, finding that each approach has distinct strengths and can work well together. API agents interact with software through programmable interfaces, while GUI agents mimic human use of software, navigating menus and clicking buttons. API agents are generally more stable and less error-prone, while GUI agents require multiple actions to accomplish the same goal. However, GUI agents can control almost any software with a visible interface, whether or not it offers an API. Microsoft outlines three strategies for combining both types of agents into hybrid systems: using API wrappers to hide GUI actions behind a programmable interface, using orchestration tools to coordinate both API and GUI steps in a workflow, and using low-code and no-code platforms for non-technical users to build automations using drag-and-drop interfaces. Recent advances in multimodal AI and new tools simplifying API development could lead to more flexible forms of automation that blur the line between front-end and back-end integration. Choosing the right agent for the job is crucial for long-term automation success. API agents are best for performance-critical tasks and security-sensitive environments, while GUI agents are better suited for legacy systems that lack APIs and mobile apps. Organizations can start with GUI agents and gradually switch to APIs as they become available.

Read Article