Invention Title:

ARTIFICIAL INTELLIGENCE DEVICE FOR CREATING ANIMATABLE 4D PORTRAIT AVATARS WITH MORPHABLE MULTI-VIEW DIFFUSION MODELS AND METHOD THEREOF

Publication number:

US20260154890

Publication date:
Section:

Physics

Class:

G06T13/40

Inventors:

Assignees:

Applicants:

Smart overview of the Invention

The patent application describes a novel method and device for generating animatable four-dimensional (4D) avatars using artificial intelligence (AI). The process begins by receiving a set of reference images of a subject, which are then encoded into reference latents. The method estimates 3D morphable model parameters to derive pose and expression conditioning signals. A morphable multi-view diffusion model generates synthetic images from various viewpoints, utilizing an iterative reverse diffusion process with stochastic conditioning. These generated images are then used to train a 4D avatar model capable of real-time animation.

Background

Current methods for creating 4D avatars face challenges in balancing accessibility and quality. High-fidelity approaches often require complex setups, while simpler methods struggle with consistency and scalability. Monocular or few-shot inputs frequently lead to artifacts and inconsistencies. Additionally, the computational demands of existing techniques hinder real-time performance on standard consumer hardware. Therefore, there is a need for a solution that can generate high-quality 4D avatars from a limited number of images without the need for specialized equipment.

Technical Approach

The proposed method leverages a morphable multi-view diffusion model with stochastic conditioning to synthesize consistent novel views. This approach allows for the generation of dense 3D reconstructions from limited inputs while preserving the subject's identity. The method uses a combination of generative diffusion processes and real-time rendering techniques, employing 3D Gaussian splatting augmented with expression-dependent appearance models. This enables the creation of photorealistic, animatable avatars that capture fine details and can be rendered in real-time on standard devices.

Implementation Details

The AI device processes the reference images to encode them into latents and estimate 3D morphable model parameters. It generates synthetic images using a diffusion model, applying stochastic conditioning by randomly sampling subsets of reference and generated latents. The 4D avatar model, trained on both reference and synthetic images, utilizes 3D Gaussian splatting initialized from a parametric mesh. Expression-dependent appearance models dynamically adjust the Gaussian properties, allowing for realistic animation based on the subject's expressions.

Applications and Benefits

This method provides a scalable solution for generating high-fidelity 4D avatars from varying numbers of input images. It ensures identity preservation and consistency across generated frames, making it suitable for applications in telepresence, virtual reality, film production, and gaming. By optimizing the computational process, it enables real-time rendering on consumer hardware, expanding the utility of 4D avatars in interactive and real-time communications. The approach effectively synthesizes missing details, maintaining the subject's identity and enhancing the realism of the final animation.