US20250285465
2025-09-11
Physics
G06V40/161
The patent application describes systems and methods for real-time face reenactment using text and audio inputs. It involves receiving a target video with a target face and a source video with a source face. Facial expression parameters of the source face are determined using a parametric face model. The target face is then modified in real time to mimic the source face's expressions, producing a sequence of modified video frames which can be displayed on computing devices as they are generated.
Face reenactment technology transfers facial expressions from one individual in a source video to another individual in a target video or image. This technology is applicable in various fields such as entertainment, virtual reality, and video communications. Existing methods either use morphable face models for fast but less photorealistic results or deep learning methods for photorealistic but time-consuming results. These existing techniques often struggle with real-time processing on regular mobile devices.
The disclosed methods and systems aim to address these challenges by enabling real-time face reenactment on mobile devices without requiring internet connectivity or server-side resources. The approach involves building a statistical face morphable model from recorded facial images, training a deep learning model for synthesizing mouth and eye regions, and performing real-time facial reenactment. This significantly reduces computation time while maintaining photorealistic quality.
One potential application of this technology is in personalized advertising. By replacing an actor's face in advertisements with another individual's face, such as a user's friend or favorite celebrity, the advertisement becomes more engaging and memorable. The technology can be implemented through software on computers or mobile devices, or via hardware like application-specific integrated circuits (ASICs) or programmable logic devices.
The process includes several steps: receiving target and source videos, determining facial expressions using parametric models, synthesizing output faces with altered expressions, and generating regions like the mouth and eyes using deep neural networks (DNNs). The parametric face model incorporates facial identity, texture, and blend shapes for various expressions. Inputs for the DNN include parameters from the parametric model and previous frame regions, trained on historical facial images.