US20260134272
2026-05-14
Physics
G06N3/063
The invention addresses the need for memory-efficient streaming operations in neural network processors. It introduces a neural processor circuit comprising a neural engine and a data processor circuit. The neural engine performs operations on input tensors across different layers, generating output tensors that serve as input for subsequent layers. The data processor circuit facilitates this by storing portions of input and output tensors in buffers, enabling efficient access and processing by the neural engine.
Neural networks, especially convolutional neural networks (CNNs), require extensive computations involving operations like multiplication and accumulation. These operations are typically organized into layers, each performing specific transformations. While CPUs can handle these tasks, they consume significant bandwidth and power. The invention proposes an alternative approach using a specialized neural processor circuit to optimize these operations.
The neural processor circuit operates by streaming convolution operations across multiple layers. It uses tensor buffers to store partial results, which are then used as inputs for subsequent layers. This streaming approach allows for parallel execution of convolution operations, improving efficiency. The data control circuit manages the storage and retrieval of tensor portions, ensuring that the neural engine has continuous access to the necessary data for processing.
The described technology can be integrated into various electronic devices, including portable communications devices like smartphones and tablets, as well as non-portable devices like desktop computers. These devices may feature touch-sensitive interfaces and various sensors, supporting functionalities such as facial recognition through machine learning models. The neural processor circuit can enhance these devices' capabilities by providing efficient processing power for complex neural network operations.
An example device, such as the iPhone, may incorporate the neural processor circuit to handle machine learning tasks efficiently. This device could include components like touch screens, image sensors, and motion sensors, all working together with the neural processor to deliver advanced functionalities. The integration of this technology allows for improved performance in tasks requiring intensive computation, such as real-time image processing and user interface interactions.