US20260037480
2026-02-05
Physics
G06F15/8092
The patent application discusses systems and methods for enhancing inter-accelerator data transfers using a pixel processing engine (PPE) with a two-dimensional array of processing engines (PEs). These PEs can communicate and transfer data between each other, enabling complex operations like filtering to be performed efficiently. This design is particularly relevant for tasks that require high-speed data processing, such as image and signal processing in autonomous systems.
Traditional processing accelerators, such as vector processing units (VPUs), face challenges with latencies due to memory access during single instruction, multiple data operations. These latencies hinder the performance of computer vision applications in systems like automated vehicles. The inefficiencies arise from the need to read and write data to memory, which this new approach aims to mitigate by allowing direct data transfers between processing elements.
The proposed system utilizes a PPE to process data independently of a common memory source. By adjusting the bit width scale in at least one dimension, the system reduces latency effects associated with memory operations. This allows for improved math-to-memory ratios and enhances the efficiency of accelerators performing single instruction, multiple data operations, enabling more complex mathematical operations without repeated memory access.
Each processing element in the system includes circuits that handle data associated with pixels and instructions. These elements can update pixel values based on input data and instructions, and communicate with other elements to access additional data. This interconnectivity is crucial for implementing advanced filters and operations that require data from multiple pixels, enhancing the overall processing capability of the system.
The technology is applicable to a wide range of systems, including autonomous vehicle control, robotic perception, deep learning, virtual reality, and cloud computing. It supports operations in environments like data centers and edge devices, and is suitable for tasks involving large language models, synthetic data generation, and real-time streaming applications. This flexibility makes it a valuable tool for modern computational needs.