US20250379978
2025-12-11
Electricity
H04N19/124
Innovations in neural image and video codecs focus on improving efficiency and quality through advanced entropy models and flexible quantization techniques. A neural video encoder processes video frames to generate encoded data, which is then output as a bitstream. The encoding involves determining a latent representation of the frame and using an entropy model network with convolutional layers to encode it. Statistical characteristics of quantized latent representations are estimated using previous frames, enhancing the compression process.
Compression reduces digital video bit rates, aiding in storage and transmission. Traditional codecs, like H.264 and H.265, define video bitstream syntax and decoding operations. Recently, neural networks have been employed for compression, using entropy models to predict probability distributions of quantized images or video frames. Despite advancements, there's potential for further improvements in compression quality and efficiency.
The neural video encoder estimates statistical characteristics of quantized latent representations using previous frame data, improving rate-distortion (RD) performance by exploiting temporal redundancy. The encoder organizes latent representation elements across channel and spatial dimensions, splitting them into multiple sets for efficient encoding. This approach, combined with cross-set estimation, leverages spatial and channel redundancies to enhance RD performance.
A corresponding decoder reconstructs video frames from encoded data in a bitstream. It uses an entropy model network to rebuild latent representations, estimating statistical characteristics of quantized elements across different sets. This process mirrors the encoding phase, ensuring efficient reconstruction by exploiting spatial and channel redundancies.
The encoder employs multi-stage quantization with varying quantization step values, allowing flexibility across different quality and bitrate levels. This approach aids in adapting the neural encoder to diverse requirements. The decoder mirrors this process through inverse quantization, ensuring accurate reconstruction of the original frames.