Runtime AI NPU Vectorization

Custom CNN Inference Layers on Hexagon NPU

Schematic drawing of a Hexagon DSP with processing units and memory.

Custom neural network layers, such as a user-defined head that does pro­blem-spe­cif­ic thresholding, can pose difficulties for neural network frameworks like Qualcomm®’s Neural Processing SDK (QNN library). Such custom layers typically do not contain frequent operations like convolutions, but many small and – individually – relatively slow operations.

We helped our customer to implement one of their custom layers in a video perception neural network optimally for the Vector Co-pro­ces­sor (“HVX”) of the Qualcomm Hexagon™ DSP, speeding it up 30×.

Quick facts

Hardware:

  • SoC: Qualcomm Snapdragon®
  • NPU: Hexagon DSPs with HVX Vector Co-processors

Operating System:

  • Blackberry® QNX®

Compiler:

  • Qualcomm Hexagon compiler (Clang-based)

Summary of our results:

  • Optimized custom CNN inference layer 30× on Hexagon DSP
All referenced product or service names and trademarks are the property of their respective owners.