Invention Title:

SYSTEMS AND METHODS FOR PERFORMING INTER-ACCELERATOR DATA TRANSFERS

Publication number:

US20260037480

Publication date:

2026-02-05

Section:

Physics

Class:

G06F15/8092

Inventors:

Ching-Yu Hung 🇺🇸 Pleasanton, CA, United States

Jagadeesh SANKARAN 🇺🇸 Dublin, CA, United States

Sreenivas Krishnan 🇺🇸 Campbell, CA, United States

Ravi Pratap Singh 🇺🇸 Austin, TX, United States

Yen-Te Shih 🇹🇼 Zhubei City, Taiwan

Divya Ojha 🇺🇸 San Jose, CA, United States

Andrew Peter Taussig 🇺🇸 Scottsdale, AZ, United States

Arun Visweswaraiah 🇺🇸 Fremont, CA, United States

Assignee:

NVIDIA Corporation 🇺🇸 Santa Clara, CA, United States

Applicant:

NVIDIA Corporation 🇺🇸 Santa Clara, CA, United States

Smart overview of the Invention

The patent application discusses systems and methods for enhancing inter-accelerator data transfers using a pixel processing engine (PPE) with a two-dimensional array of processing engines (PEs). These PEs can communicate and transfer data between each other, enabling complex operations like filtering to be performed efficiently. This design is particularly relevant for tasks that require high-speed data processing, such as image and signal processing in autonomous systems.

Background

Traditional processing accelerators, such as vector processing units (VPUs), face challenges with latencies due to memory access during single instruction, multiple data operations. These latencies hinder the performance of computer vision applications in systems like automated vehicles. The inefficiencies arise from the need to read and write data to memory, which this new approach aims to mitigate by allowing direct data transfers between processing elements.

Innovative Approach

The proposed system utilizes a PPE to process data independently of a common memory source. By adjusting the bit width scale in at least one dimension, the system reduces latency effects associated with memory operations. This allows for improved math-to-memory ratios and enhances the efficiency of accelerators performing single instruction, multiple data operations, enabling more complex mathematical operations without repeated memory access.

Processing Elements

Each processing element in the system includes circuits that handle data associated with pixels and instructions. These elements can update pixel values based on input data and instructions, and communicate with other elements to access additional data. This interconnectivity is crucial for implementing advanced filters and operations that require data from multiple pixels, enhancing the overall processing capability of the system.

Applications

The technology is applicable to a wide range of systems, including autonomous vehicle control, robotic perception, deep learning, virtual reality, and cloud computing. It supports operations in environments like data centers and edge devices, and is suitable for tasks involving large language models, synthetic data generation, and real-time streaming applications. This flexibility makes it a valuable tool for modern computational needs.