Invention Title:

PRIOR MODEL FOR GAUSSIAN SPLATTING - BASED AVATARS

Publication number:

US20260134623

Publication date:

2026-05-14

Section:

Physics

Class:

G06T17/10

Inventors:

Benjamin Eliot LUNDELL 🇺🇸 Seattle, WA, United States

Charles Thomas HEWITT 🇬🇧 Cambridge, United Kingdom

Jack Roe Saunders 🇬🇧 Bath, United Kingdom

Yanan Jian 🇺🇸 Palo Alto, CA, United States

Applicant:

Microsoft Technology Licensing, LLC 🇺🇸 Redmond, WA, United States

Smart overview of the Invention

A novel system and method for rendering three-dimensional digital avatars efficiently using minimal input data is introduced. This approach leverages a deep neural network (DNN)-based prior model to address the challenges of generating photorealistic avatars. The model is trained to recognize features and create a canonical template representing average user characteristics. During enrollment, personalized offsets are calculated based on individual user features, allowing the creation of high-quality, real-time 3D avatars from a single audio or visual input. These avatars can be animated in real-time based on user expressions or sounds captured by input devices.

Technical Approach

The system employs Neural Radiance Fields (NeRFs) and Gaussian splatting techniques to render high-quality digital representations. Gaussian splatting involves using tiny, translucent ellipsoids to create 3D images. However, traditional methods require multiple synchronized cameras, making them costly and complex. This new solution overcomes these limitations by synthesizing views and expressions from a single camera input, enabling avatars to be animated with just one audio or visual input.

Enrollment Process

The enrollment process involves several stages to personalize the avatar. Initially, an appearance vector is determined, capturing features such as skin tone and eye shape. In the next stage, the model's weights are adjusted to minimize differences between the user's features and the canonical template. This results in personalized Gaussian offsets that, when combined with the canonical template, generate an accurate 3D representation. Further refinement occurs in subsequent stages to optimize the avatar's fidelity.

Rendering and Animation

Once the enrollment is complete, the canonical template and personalized offsets are used to render the avatar during communication sessions. Gaussian splats, derived from these templates, are applied to a mesh-based representation to create the avatar. The avatar can then be animated based on real-time user inputs, such as audio signals or facial expressions, enhancing the interaction experience in applications like virtual reality, gaming, and video conferencing.

Technical Advantages

This method significantly reduces the computational resources needed for avatar rendering, addressing a key technical challenge in computer networks. By using a single audio or visual input, the system decreases the need for multiple cameras and extensive computing power. This approach not only improves device efficiency but also allows for the generation of a 360° avatar view from limited input data, enhancing the overall functionality and user experience in digital environments.