Invention Title:

METHOD AND SYSTEM FOR DIFFUSION MODELS BASED GENERATION OF CUSTOMIZED TEXTUAL IMAGES

Publication number:

US20250308116

Publication date:

2025-10-02

Section:

Physics

Class:

G06T11/60

Inventors:

Shubham Singh Paliwal 🇮🇳 New Delhi, India

ARUSHI JAIN 🇮🇳 New Delhi, India

MONIKA SHARMA 🇮🇳 New Delhi, India

LOVEKESH VIG 🇮🇳 New Delhi, India

VIKRAM JAMWAL 🇮🇳 Pune, India

Assignee:

Tata Consultancy Services Limited 🇮🇳 Mumbai, India

Applicant:

Tata Consultancy Services Limited 🇮🇳 Mumbai, India

Smart overview of the Invention

The patent application introduces a method and system for generating customized textual images using diffusion models. It addresses the limitations of traditional diffusion-based methods in creating textual content with complex font attributes. The process begins by receiving an input image, a textual prompt, and several control parameters. These inputs are used to extract a character mask and a conditional mask, which guide the generation of accurate customized textual images.

Technical Field

This innovation falls within the field of image processing, specifically focusing on the generation of customized textual images using diffusion models. The method enhances the quality of text rendering in images, which is crucial for applications across various industries such as entertainment, advertising, and education. By automating the creation of high-quality text images, it reduces the need for professional skills and iterative design processes.

Background

Text-to-image synthesis has evolved significantly with the advent of diffusion models, which offer advantages over traditional methods like GANs. However, existing models often lack comprehensive control over text generation, particularly when dealing with complex fonts and small text sizes. Previous works like Glyph-Draw and TextDiffuser have made strides in this area but still face challenges in generating dense and small text accurately.

Methodology

The proposed method involves several key steps: receiving input data including an image and a textual prompt, generating character and conditional masks based on control parameters, and using these masks to guide a diffusion model. The model initializes with random Gaussian noise and iteratively refines an intermediate image to produce a latent vector image. Finally, a trained consistency model generates the customized textual image from this latent vector image.

Implementation

The system is implemented through hardware processors configured to execute programmed instructions stored in memory. These processors handle tasks such as generating masks and refining images through diffusion models. A computer program product is also provided, enabling devices to perform these operations autonomously. This approach ensures precise control over font attributes and enhances the clarity of generated text within images.