Invention Title:

EDITING DIGITAL IMAGES USING EXECUTABLE CODE GENERATED BY LARGE LANGUAGE MODELS FROM NATURAL LANGUAGE INPUT

Publication number:

US20250342628

Publication date:

2025-11-06

Section:

Physics

Class:

G06T11/60

Inventors:

Trung Bui 🇺🇸 San Jose, CA, United States

Handong Zhao 🇺🇸 San Jose, CA, United States

Quan Tran 🇺🇸 San Jose, CA, United States

Seunghyun Yoon 🇰🇷 Seoul, South Korea

Jing Shi 🇺🇸 Rochester, NY, United States

Qiucheng Wu 🇺🇸 Goleta, CA, United States

Applicant:

Adobe Inc. 🇺🇸 San Jose, CA, United States

Smart overview of the Invention

The system focuses on editing digital images using executable code generated from natural language input. Utilizing large language models, it translates text instructions into action code compatible with image editing applications. This process allows users to modify images by simply providing text-based instructions, which are then executed to produce the desired changes in the image.

Background and Purpose

As digital image usage grows, so does the need for efficient editing systems. Traditional methods often require multiple user interactions and complex navigation through interfaces. The described system aims to streamline this process by reducing user effort and enhancing flexibility, allowing for precise adjustments based on natural language input.

System Functionality

The system receives natural language requests to edit images, identifying key elements like target objects and actions. It uses a large language model to generate executable code formatted for specific editing applications. This code is executed to modify the image, with provisions for user interactions to adjust actions during the editing process.

Technological Advantages

Compared to conventional systems, this approach offers improved efficiency by minimizing user interactions needed for editing. It also enhances flexibility by leveraging existing tools within editing applications and allowing user interventions to fine-tune edits. The system supports local edits and multiple modifications on various objects from a single input.

User Interaction and Flexibility

The system supports user input for adjusting the editing sequence, providing an interactive experience. By incorporating in-context learning, it generates outputs that align with user intentions, offering a more personalized and responsive editing process. This adaptability distinguishes it from traditional systems that often lack such dynamic capabilities.