Invention Title:

NEURAL SEMANTIC 3D CAPTURE WITH NEURAL RADIANCE FIELDS

Publication number:

US20250308152

Publication date:

2025-10-02

Section:

Physics

Class:

G06T17/00

Inventor:

Yangming WEN 🇺🇸 Fremont, CA, United States

Assignee:

ELECTRONIC ARTS INC. 🇺🇸 Redwood City, CA, United States

Applicant:

Electronic Arts Inc. 🇺🇸 Redwood City, CA, United States

Smart overview of the Invention

The described method focuses on generating a three-dimensional (3D) model using a set of two-dimensional (2D) images captured from various camera angles and positions. It involves creating semantic masks for different object classes within the scene, which are then used alongside the 2D images to train a neural radiance field (NeRF) model. The trained NeRF model serves as an implicit 3D representation of the objects, enabling the rendering of high-quality 3D models.

Background

The demand for 3D content in fields such as entertainment, industrial design, and medicine is significant. Traditional 3D modeling techniques often rely on explicit representations like voxels, point clouds, or polygonal meshes, each with its limitations. These methods can be resource-intensive and may require extensive manual effort. Photogrammetry, a technique that reconstructs 3D models from 2D images, often involves complex hardware and software setups, resulting in high costs and potential issues with mesh quality.

Methodology

The method leverages semantic priors in an enhanced photogrammetry pipeline to generate 3D models. Semantic masks are created from a set of 2D images and embedded into these images to form a training dataset for the NeRF model. This approach allows the NeRF model to automatically produce high-quality 3D meshes without manual cleanup. The pipeline is particularly effective in capturing intricate details like human hair, which traditional methods struggle with.

Technical Details

Neural radiance fields (NeRF) utilize a deep neural network to synthesize views by optimizing a volumetric scene function with sparse input images. The network predicts volume density and emitted radiance based on spatial location and camera viewing direction. By sampling points along camera rays, traditional volume rendering techniques help render 2D images. Postprocessing can convert the implicit NeRF model into explicit 3D representations like meshes.

Implementation

Scene-object decomposition is applied in the NeRF pipeline using semantic masks, which include class labels and associated voxels. These masks guide the training of the NeRF model and can be integrated into its loss function. The process begins with capturing a set of 2D images from various viewpoints, followed by training the NeRF model using these images as the dataset. The final output is a detailed 3D model that accurately represents the scene's objects.