Runtime AI NPU Vectorization

Custom CNN Inference Layers on Hexagon NPU

**Schematic drawing of a Hexagon DSP with processing units and memory.**

Custom neural network layers, such as a user-defined head that does problem-specific thresholding, can pose difficulties for neural network frameworks like Qualcomm®’s Neural Processing SDK (QNN library). Such custom layers typically do not contain frequent operations like convolutions, but many small and – individually – relatively slow operations.

We helped our customer to implement one of their custom layers in a video perception neural network optimally for the Vector Co-processor (“HVX”) of the Qualcomm Hexagon™ DSP, speeding it up 30×.

Quick facts

Hardware:

SoC: Qualcomm Snapdragon®
NPU: Hexagon DSPs with HVX Vector Co-processors

Operating System:

Blackberry® QNX®

Compiler:

Qualcomm Hexagon compiler (Clang-based)

Summary of our results:

Optimized custom CNN inference layer 30× on Hexagon DSP

All referenced product or service names and trademarks are the property of their respective owners.