Invention Title:

NEURAL NETWORK CODEC WITH HYBRID ENTROPY MODEL AND FLEXIBLE QUANTIZATION

Publication number:

US20250379978

Publication date:

2025-12-11

Section:

Electricity

Class:

H04N19/124

Inventors:

Bin LI 🇨🇳 Beijing, China

Yan Lu 🇨🇳 Beijing, China

Jiahao Li 🇨🇳 Beijing, China

Assignee:

Microsoft Technology Licensing, LLC 🇺🇸 Redmond, WA, United States

Applicant:

Microsoft Technology Licensing, LLC 🇺🇸 Redmond, WA, United States

Smart overview of the Invention

Innovations in neural image and video codecs focus on improving efficiency and quality through advanced entropy models and flexible quantization techniques. A neural video encoder processes video frames to generate encoded data, which is then output as a bitstream. The encoding involves determining a latent representation of the frame and using an entropy model network with convolutional layers to encode it. Statistical characteristics of quantized latent representations are estimated using previous frames, enhancing the compression process.

Background

Compression reduces digital video bit rates, aiding in storage and transmission. Traditional codecs, like H.264 and H.265, define video bitstream syntax and decoding operations. Recently, neural networks have been employed for compression, using entropy models to predict probability distributions of quantized images or video frames. Despite advancements, there's potential for further improvements in compression quality and efficiency.

Encoding Innovations

The neural video encoder estimates statistical characteristics of quantized latent representations using previous frame data, improving rate-distortion (RD) performance by exploiting temporal redundancy. The encoder organizes latent representation elements across channel and spatial dimensions, splitting them into multiple sets for efficient encoding. This approach, combined with cross-set estimation, leverages spatial and channel redundancies to enhance RD performance.

Decoding Process

A corresponding decoder reconstructs video frames from encoded data in a bitstream. It uses an entropy model network to rebuild latent representations, estimating statistical characteristics of quantized elements across different sets. This process mirrors the encoding phase, ensuring efficient reconstruction by exploiting spatial and channel redundancies.

Quantization Flexibility

The encoder employs multi-stage quantization with varying quantization step values, allowing flexibility across different quality and bitrate levels. This approach aids in adapting the neural encoder to diverse requirements. The decoder mirrors this process through inverse quantization, ensuring accurate reconstruction of the original frames.