US20250315988
2025-10-09
Physics
G06T11/00
The application introduces an AI-based image generation method and apparatus, designed to efficiently create images that meet both content and style requirements. The method involves obtaining content text and style images, encoding the text, extracting style codes from images, and using a dual cross-attention mechanism for reverse diffusion processing. This approach generates a target image that aligns with the content text and desired style.
This disclosure falls under AI technology, specifically focusing on image generation methods and apparatuses. It leverages advancements in natural language processing (NLP), machine learning, and deep learning to enhance image generation capabilities. The method addresses limitations of existing style transfer solutions by producing more refined and accurate images.
AI technologies enable machines to perform perception, reasoning, and decision-making tasks. Style transfer technologies have traditionally been limited to modifying existing images with coarse-grained effects, primarily focused on color. The disclosed method aims to overcome these limitations by generating images that meet specific content and style requirements more efficiently.
The disclosed embodiments include a method performed by an electronic device with a memory and processor. The process involves obtaining text and style images, encoding the text into content codes, extracting style codes from images, and applying reverse diffusion processing using a dual cross-attention mechanism. This results in a target image that matches both the content text and the target style.