US20260037318
2026-02-05
Physics
G06F9/5038
The patent application introduces a task execution system designed to automate user-requested actions across interactive interfaces using advanced machine learning models. Central to this system is a generative AI action model enhanced by retrieval-augmented generation (RAG), which improves the accuracy and efficiency of task completion. This system addresses the limitations of large action models (LAMs), such as error-prone actions and the need for continuous user input, by creating a session plan that guides the execution of tasks through various interface segments.
Recent advancements in AI have led to the development of LAMs, which simulate user interactions with software interfaces. Despite their capabilities, LAMs encounter significant challenges, particularly when actions fail or require a sequence of interactions. The task execution system described in this application leverages generative AI and RAG to overcome these challenges. By using visual context information and prior user session data, the system can self-correct and adjust session plans dynamically, ensuring seamless task execution.
The task execution system employs a generative AI action model and a visual-based generative AI model, both enhanced with sanitized user session data from a RAG database. This setup allows the system to generate and execute session plans autonomously, without needing additional user input. The system can adjust to obstacles by generating updated plans or alternative actions. This approach not only improves accuracy and efficiency but also enhances the flexibility of automated task execution across various interfaces.
By integrating prior user session information, the task execution system creates tailored session plans that increase the precision of actions performed on interactive interfaces. This leads to fewer errors and more efficient task completion. The use of grounding information helps the system identify and interact with the correct elements on an interface, further enhancing accuracy. The system's ability to self-correct and adapt to different interfaces makes it a versatile tool for automating complex tasks without user intervention.
Key terms within this application include: