Invention Title:

MULTICHANNEL AUDIO ENCODE AND DECODE USING DIRECTIONAL METADATA

Publication number:

US20250342844

Publication date:

2025-11-06

Section:

Physics

Class:

G10L19/008

Inventor:

David S. McGrath 🇦🇺 Rose Bay, Australia

Assignee:

DOLBY LABORATORIES LICENSING CORPORATION 🇺🇸 SAN FRANCISCO, CA, United States

Applicant:

DOLBY LABORATORIES LICENSING CORPORATION 🇺🇸 San Francisco, CA, United States

Smart overview of the Invention

The patent application discusses innovative methods for processing spatial audio signals to create a compressed representation. This involves analyzing the spatial audio signal to identify the directions of arrival for various audio elements. For each frequency subband, the method determines signal power indications associated with these directions. Metadata is generated, encompassing direction and energy information, which includes these indications. A channel-based audio signal with a predefined number of channels is then created based on the spatial audio signal, resulting in a compressed representation that includes both the channel-based audio signal and metadata.

Technical Field

The focus is on audio signal processing, specifically on methods for creating compressed representations of spatial audio signals and reconstructing these signals from their compressed forms. The spatial audio signal may be multichannel or object-based, and the goal is to reduce size while maintaining quality. The process involves analyzing the spatial scene to identify dominant audio elements and their directions of arrival, which are crucial for generating accurate metadata.

Background

Human hearing allows perception of spatial audio scenes, which are captured and reproduced through various technologies. Audio streams represent these scenes and may include metadata to aid playback. Typically, metadata informs about speaker arrangements, but often it is omitted due to standardization assumptions. Advanced formats like Higher Order Ambisonics require efficient compression to manage high bandwidth demands during transmission or storage.

Methodology

The proposed method processes spatial audio signals by determining directions of arrival for dominant audio elements within the scene. It calculates signal power indications for each frequency subband and generates corresponding metadata. A channel-based audio signal is then produced, which may have fewer channels than the original. This method ensures that compressed representations are efficient yet capable of reconstructing high-quality approximations of the original spatial scenes.

Implementation Details

The analysis can be applied across all frequency subbands or specific ones, possibly using scene analysis techniques. For object-based signals, conversion to multichannel format may precede analysis. Signal power indications are determined per frequency subband and time segment, potentially using time-frequency representations like discrete Fourier transforms. The method accommodates object-based signals by panning audio objects to predefined channels, facilitating effective downmixing and compression.