Invention Title:

METHOD AND APPARATUS FOR VIDEO CODING USING DEEP LEARNING BASED IN-LOOP FILTER FOR INTER PREDICTION

Publication number:

US20260039825

Publication date:
Section:

Electricity

Class:

H04N19/124

Inventors:

Assignees:

Applicants:

Smart overview of the Invention

A novel method and apparatus for video coding are introduced, leveraging a deep learning-based in-loop filter for inter-prediction. This approach targets both predictive frames (P-frames) and bi-predictive frames (B-frames) to address image distortion levels that vary with quantization parameter (QP) values. The technique aims to enhance video quality and coding efficiency by effectively mitigating these distortions.

Technical Background

Video data, due to its extensive size compared to audio or still images, demands significant hardware resources for storage or transmission. Traditional compression techniques like H.264/AVC, HEVC, and VVC have improved efficiency, yet as video resolution and frame rates increase, the need for more advanced compression methods grows. Deep learning-based image processing has emerged as a promising enhancement to existing video encoding techniques, offering improved coding efficiency and image quality.

Innovative Approach

The disclosed method enhances video quality through a deep learning-based apparatus that processes reconstructed frames and decoded QP values. The apparatus calculates an embedding vector or estimates image distortion using deep learning models. A denoising model then utilizes this information to remove quantization noise, resulting in an enhanced frame. This process is applied to both P-frames and B-frames, optimizing the video coding process by adapting to the varying distortion levels induced by different QP values.

Apparatus and Methodology

The video encoding apparatus comprises several components, including a picture splitter, predictor, subtractor, transformer, and more, each potentially implemented in hardware or software. The apparatus processes video sequences by splitting pictures into coding tree units (CTUs) and further into coding units (CUs) using tree structures like quadtree, binarytree, and ternarytree. This hierarchical splitting facilitates efficient encoding and decoding, with information encoded as syntax in various parameter sets and headers.

Detailed Implementation

The picture splitter determines CTU sizes and recursively splits them into CUs using a combination of tree structures. The quadtree plus binarytree ternarytree (QTBTTT) structure is one such method, allowing for flexible and efficient block splitting. Flags indicating split decisions and directions are encoded and transmitted to a video decoding apparatus, ensuring accurate reconstruction. This detailed approach underpins the video coding method, enhancing efficiency and quality through deep learning-based innovations.