US20260112104
2026-04-23
Physics
G06T15/205
The patent application details a method for generating photorealistic three-dimensional (3D) video content from animated two-dimensional (2D) or 3D video content. It utilizes neural radiance fields (NeRF) diffusion guided by vision-language models. The process involves obtaining animated video content, a text prompt, and view information to generate photorealistic 2D image frames. These frames are then used to render 3D image frames, allowing for the creation of novel images from new viewpoints.
The technique falls under the domain of video content generation, specifically focusing on transforming animated content into photorealistic 3D video. Photorealism is a technique that aims to reproduce images as realistically as possible, akin to photographs, and is often used in advertising and marketing to visualize products.
The method involves several steps: obtaining animated video content, a text prompt, and view information; generating photorealistic 2D frames using a vision-language model; and rendering 3D frames from these 2D frames using a 3D representation model. This process can be iterated to train the 3D model, enhancing its ability to produce high-quality, novel 3D images from different viewpoints.
Various optional implementations are described, such as using side information to enhance 2D frame generation or employing a multi-modal conditioned reverse diffusion module. The model can be trained in stages, refining the photorealistic output through backpropagation and loss computation, with high-quality appearance diffusion modules enhancing the final output.
The application also outlines a computing device and a non-transitory computer-readable medium for executing these methods. The device comprises a memory and processors configured to perform the steps of obtaining content, generating 2D frames, rendering 3D frames, and generating novel images from new views, ensuring a comprehensive framework for photorealistic content generation.