Invention Title:

AI-BASED AVATAR CREATION USING STYLE TRANSFER AND SUBJECT IMAGES

Publication number:

US20250285343

Publication date:

2025-09-11

Section:

Physics

Class:

G06T11/60

Inventors:

Ji Li 🇺🇸 San Jose, CA, United States

Mingxi CHENG 🇺🇸 Los Angeles, CA, United States

Fatima Zohra DAHA 🇺🇸 Livemore, CA, United States

Assignee:

Microsoft Technology Licensing, LLC 🇺🇸 Redmond, WA, United States

Applicant:

Microsoft Technology Licensing, LLC 🇺🇸 Redmond, WA, United States

Smart overview of the Invention

Artificial intelligence is increasingly automating tasks to enhance productivity, with avatar creation being a significant area of interest. Existing AI-based avatar creation platforms often require manual input of text prompts, which can be time-consuming and frustrating for users. Some systems require users to upload numerous personal photos for model training, consuming considerable time and resources. Others offer limited style templates, restricting user creativity. These challenges highlight the need for more efficient AI-based avatar creation systems that allow users to easily apply any desired image style to avatars.

System Overview

The proposed data processing system utilizes a processor and machine-readable instructions to streamline avatar creation. Users provide a style request and subject image via a client device interface. The system constructs a prompt by combining the style request and subject image with instructions for a multimodal model. This model generates textual descriptions of both the subject and style, which are then used to create a second prompt for a text-to-image model. The resulting avatar is displayed on the user's device, simplifying the process of creating personalized avatars in selected styles.

Methodology

The system employs a multimodal model to automatically generate textual descriptions of subject images and style images. These descriptions form the basis of prompts used by a text-to-image model to create avatars. By automating these steps, the system eliminates the need for manual prompt creation, enhancing user experience and efficiency. This approach addresses limitations in existing systems by providing greater flexibility in style application without extensive user input or training time.

Technical Advantages

By leveraging generative models, the system captures essential visual elements from both subject and style images, ensuring accurate representation in the final avatar. Users can easily generate avatars that reflect their desired styles, including aspects like color, layout, and texture. This method improves upon traditional text-only prompting by allowing image uploads for more personalized results. The system's ability to autonomously execute style transfer and avatar creation enhances accessibility and efficiency in avatar design.

User Experience

The system provides an intuitive user experience where individuals can upload personal images alongside style goals to create avatars. It uses large language models to interpret these inputs, forming detailed prompts for visual models that generate styled avatars. This process supports the creation of personalized avatars infused with detailed styles, enhancing user satisfaction. The resulting avatars can be utilized across various platforms and applications, offering users versatile representation options.