US20250377861
2025-12-11
Physics
G06F7/5443
A neural network accelerator enhances energy efficiency in multiply-and-accumulate operations by Booth encoding stationary operands, like weights, before computation. This technique involves generating and storing Booth encoded multipliers and precomputed compensation values, avoiding per-cycle encoding during operations. The Booth encoder is strategically placed at the periphery of the accelerator to share its area overhead across multiple compute columns, supporting reconfigurable operand bit widths such as 16-, 8-, 4-, and 2-bit configurations. This methodology is applicable to various architectures including SIMD arrays, systolic arrays, and compute-in-memory arrays.
Deep neural networks (DNNs) are integral to AI applications like computer vision and speech recognition, but they demand significant computational resources. Traditional hardware architectures, such as CPUs and GPUs, primarily consume energy through memory accesses, posing challenges for energy-constrained AI edge applications. Designing DNN accelerators with improved energy efficiency is crucial, utilizing different processing architectures and stationary dataflows to maximize data reuse and minimize energy consumption.
The accelerator performs energy-efficient operations by Booth encoding stationary operands, moving the encoding process out of the compute loop. This allows Booth encoded multipliers to be reused across multiple compute cycles, reducing the need for continuous encoding. The Booth compensation circuitry precomputes compensation values, which are stored and applied during accumulation without recomputing sign corrections, thus avoiding redundant calculations and enhancing hardware efficiency.
Booth encoding is performed at the boundary of the accelerator, allowing time-sharing across compute columns to reduce area and power overhead. The reconfigurable encoding technique supports variable bit widths, enabling flexible computation. This implementation significantly improves energy efficiency by minimizing switching activity, applicable to various architectures performing MAC operations. It is compatible with zero-point quantization and can be used in AI inference devices and training environments, enhancing performance in applications like computer vision and large language models.
The integrated circuit benefits data flows with stationary multiplicands, applicable to both weight and activation stationary operations in neural networks. The teachings extend to various bit widths, channels, and columns, accommodating different architectural needs. The approach is versatile, supporting different versions of Booth encoding and applicable to a wide range of AI tasks, ensuring broad applicability across different hardware and software configurations.