US20260057896
2026-02-26
Physics
G10L21/0208
The patent application describes a method for dynamic noise suppression in audio signals using convolutional neural networks. The technique involves generating a magnitude spectrum and a phase spectrum of an input audio signal, which includes both speech and dynamic noise. A temporal convolution network (TCN) is employed to create a separation mask from the magnitude spectrum. This separation mask is then used to isolate the speech from the dynamic noise, resulting in a denoised magnitude spectrum. Finally, the original audio signal is reconstructed with reduced noise using the denoised magnitude spectrum and the phase spectrum.
Latency in audio signal processing can significantly affect user experience, especially in applications like teleconferencing. Dynamic noise suppression (DNS) is crucial for enhancing audio quality by reducing noise relative to speech. However, DNS often contributes significantly to latency, which can disrupt communication. The patent addresses this by proposing a method to minimize latency while maintaining effective noise suppression, thereby improving the overall quality of real-time audio applications.
The proposed method uses convolutional neural networks with internal state buffering to achieve reduced latency in dynamic noise suppression. By retaining prior state history in buffers, the network can process smaller audio segments without compromising quality, thus reducing latency. The method includes post-processing of training data to enhance the performance of the neural networks. The system or software product implementing these techniques can improve speech audio quality by dynamically suppressing noise.
The techniques can be implemented on various platforms, including workstations, laptops, tablets, smartphones, and voice-controlled systems. The system comprises an encoder circuit to generate magnitude and phase spectra, a TCN separator to create a separation mask, and a decoder circuit to reconstruct the noise-suppressed audio signal. The TCN includes depth-wise convolution layers with state buffers, and the separation mask is applied to the magnitude spectrum to isolate speech from noise.
The DNS system processes streaming audio on a frame-by-frame basis, reducing noise to produce a clean speech output. The encoder circuit utilizes a short-time Fourier transform to analyze the audio signal, while the TCN separator network generates a separation mask based on this analysis. The decoder circuit then reconstructs the audio signal using the denoised magnitude spectrum and the original phase spectrum. The system's architecture allows for efficient noise suppression with minimal latency, suitable for various audio applications.