US20250342628
2025-11-06
Physics
G06T11/60
The system focuses on editing digital images using executable code generated from natural language input. Utilizing large language models, it translates text instructions into action code compatible with image editing applications. This process allows users to modify images by simply providing text-based instructions, which are then executed to produce the desired changes in the image.
As digital image usage grows, so does the need for efficient editing systems. Traditional methods often require multiple user interactions and complex navigation through interfaces. The described system aims to streamline this process by reducing user effort and enhancing flexibility, allowing for precise adjustments based on natural language input.
The system receives natural language requests to edit images, identifying key elements like target objects and actions. It uses a large language model to generate executable code formatted for specific editing applications. This code is executed to modify the image, with provisions for user interactions to adjust actions during the editing process.
Compared to conventional systems, this approach offers improved efficiency by minimizing user interactions needed for editing. It also enhances flexibility by leveraging existing tools within editing applications and allowing user interventions to fine-tune edits. The system supports local edits and multiple modifications on various objects from a single input.
The system supports user input for adjusting the editing sequence, providing an interactive experience. By incorporating in-context learning, it generates outputs that align with user intentions, offering a more personalized and responsive editing process. This adaptability distinguishes it from traditional systems that often lack such dynamic capabilities.