Custom neural network layers, such as a user-defined head that does problem-specific thresholding, can pose difficulties for neural network frameworks like Qualcomm®’s Neural Processing SDK (QNN library). Such custom layers typically do not contain frequent operations like convolutions, but many small and – individually – relatively slow operations.
We helped our customer to implement one of their custom layers in a video perception neural network optimally for the Vector Co-processor (“HVX”) of the Qualcomm Hexagon™ DSP, speeding it up 30×.
Quick facts
Hardware:
- SoC: Qualcomm Snapdragon®
- NPU: Hexagon DSPs with HVX Vector Co-processors
Operating System:
- Blackberry® QNX®
Compiler:
- Qualcomm Hexagon compiler (Clang-based)
Summary of our results:
- Optimized custom CNN inference layer 30× on Hexagon DSP