Invention Title:

STYLE-BASED IMAGE GENERATION

Publication number:

US20250117973

Publication date:

2025-04-10

Section:

Physics

Class:

G06T11/00

Inventors:

Ajinkya Gorakhnath Kale San Jose, CA, United States

Midhun Harikumar Sunnyvale, CA, United States

Venkata Naveen Kumar Yadav Marri Newark, CA, United States

Fengbin CHEN Los Gatos, CA, United States

Hareesh Ravi San Jose, CA, United States

Applicant:

Adobe Inc. San Jose, CA, United States

Smart overview of the Invention

The invention described involves a method and system for generating images by leveraging machine learning techniques. It combines a text prompt that describes the image content with a style input that dictates the image's style. The process involves generating embeddings from both the text prompt and style input, which are then used in an image generation model to create a synthetic image. The unique aspect is the sequential application of these embeddings, where the text embedding is applied first, followed by the style embedding, ensuring that the style influences the image only in later stages of generation.

Background

Machine learning models are often used to create images based on various inputs like text or images. Traditional models sometimes struggle to consistently apply styles without altering the intended structure or content of the image. This method addresses these challenges by ensuring that style information is applied in a way that preserves the original content intended by the user. The approach also avoids common pitfalls such as overwhelming content with style information or inefficient training processes.

Innovative Approach

The proposed system improves upon conventional methods by introducing style information at later stages of image generation. This ensures that the structure and content of the synthetic image are established first, allowing for consistent application of style without altering these elements. This approach is more efficient and scalable, as it avoids extensive data augmentation or oversampling and does not require retraining for new styles.

Practical Application

An example scenario involves a user providing a text prompt like "astronaut sitting on a chair" and selecting a style such as "pencil drawing." The system encodes these inputs to generate corresponding embeddings, which guide the image generation model in creating an image that reflects both the content and chosen style. This process allows for accurate depiction while maintaining efficiency and scalability.

Advantages

The described system offers significant improvements over traditional methods by efficiently generating stylized images that accurately reflect both content and style inputs. By introducing style information later in the process, it maintains the integrity of the original content while applying styles effectively. This approach not only enhances image quality but also reduces computational costs and complexity associated with training models for each specific style.