Invention Title:

SYSTEMS, METHODS, AND APPARATUS FOR IMAGE-RESPONSIVE AUTOMATED ASSISTANTS

Publication number:

US20250124707

Publication date:

2025-04-17

Section:

Physics

Class:

G06V20/20

Inventors:

Gökhan Bakir Zurich, Switzerland

Marcin Nowak-Przygodzki Bach, Switzerland

Applicant:

Google LLC Mountain View, CA, United States

Smart overview of the Invention

The patent describes a system that allows users to interact with automated assistants using images captured by a camera, minimizing the need for typed or spoken inputs. This system is designed to enhance privacy and ease of use by allowing the assistant to respond to objects detected in the camera's field of view. Users can select from various image conversation modes suggested by the system based on detected objects, enabling tailored interactions and responses.

Functionality

The system processes images to determine attributes of captured objects, which are then used to select relevant conversation modes from a set of available options. These modes are presented as selectable elements on the user interface. Upon selecting a mode, the system generates and displays output tailored to both the chosen mode and the object's attributes. This output can include data retrieved through queries formulated based on these attributes, enhancing the relevance and specificity of information provided.

Contextual Adaptation

Contextual features such as location, time, and recently detected objects can influence mode selection and data presentation. For instance, a "pricing" mode might be more relevant in a grocery store than at home. The system uses these contextual cues to adjust which conversation modes are offered and how prominently they are displayed, ensuring that interactions remain contextually appropriate and useful.

Example Application

An example scenario involves using the camera to capture an image of a Red Delicious apple. The system identifies attributes like "food" and "apple," selecting a "calorie" conversation mode associated with these attributes. A user selecting this mode might see calorie information for the apple displayed alongside the camera feed. The same process applies if other food items are captured, such as bananas, providing immediate nutritional data.

System Implementation

The described system includes components such as a camera, display device, processors, and memory. It processes image data locally to maintain privacy, only transmitting necessary identifiers for remote processing when needed. This approach ensures that sensitive image data remains secure while still enabling dynamic interaction with the automated assistant through various conversation modes tailored to real-time visual inputs.