Invention Title:

PIPELINE FOR GENERATING EDITABLE GRAPHIC DESIGNS FROM NATURAL LANGUAGE PROMPTS

Publication number:

US20250124622

Publication date:

2025-04-17

Section:

Physics

Class:

G06T11/60

Inventors:

Sumithra BHAKTHAVATSALAM Kirkland, WA, United States

Gaurav Vinayak TENDOLKAR Reston, VA, United States

Assignee:

Microsoft Technology Licensing, LLC Redmond, WA, United States

Applicant:

Microsoft Technology Licensing, LLC Redmond, WA, United States

Smart overview of the Invention

A novel device is introduced that enhances graphic design generation through natural language prompts. This device uses a processor and memory to execute instructions that transform user input into editable graphic designs. It employs a two-step process involving a Large Language Model (LLM) and a text-to-image model, allowing users to input textual descriptions of desired designs, which are then restructured and transformed into visual proposals.

Background

Traditional design applications rely heavily on template libraries to offer design suggestions. These applications often have a limited number of templates, constraining their ability to meet diverse user preferences. The effectiveness of these applications in proposing suitable designs is proportional to the size and variety of their template catalogs. Despite having large libraries, older applications still face challenges in providing personalized design options due to static template limitations.

Technical Solution

The proposed solution leverages generative AI technologies to overcome the limitations of static template libraries. By using generative AI, the system can produce unlimited design variations beyond the constraints of existing templates. This approach utilizes Natural Language Processing (NLP) to interpret user descriptions and generate design suggestions that align closely with user intent, even for brief or ambiguous inputs.

Challenges and Approaches

Several technical challenges are addressed, including understanding natural language prompts and retrieving relevant content for designs. The system needs to accurately interpret user requests and translate them into visual elements. Text-to-image models help fill gaps in asset libraries by generating images based on pre-trained data. However, these models are traditionally designed for photo-realistic images, not editable graphic designs, presenting unique challenges in producing design-like outputs.

Implementation

A chain-of-models approach is employed, combining generative AI with deep learning models to produce editable graphic designs from natural language prompts. The system operates through a user interface on devices like desktops or tablets, where users can input design descriptions. The application processes these inputs through LLMs and text-to-image models to create proposed designs, which users can further edit and personalize according to their needs.