Runtime ARM Neon Vectorization

Feature Detection for Classical Computer Vision

Possible pixel neighborhood for feature detection algorithms.

Autonomous robots need to recognize objects in order to understand their surroundings and localize themselves on maps. Therefore, the core of every vision pipeline contains software to detect and track objects of interest, called “features”. Classical computer vision algorithms define the location of such features based on the color or brightness of pixels. Typically, such a feature location is not only dependent on the pixel value at this precise location but also its neighbor pixels or other elaborate constraints.

To fully utilize a modern CPU, the right utilization of the vector units is of utmost importance. We were able to deliver up to 7× speedup with a hand-crafted vectorization of the customer’s algorithm.

Quick facts

Hardware:

  • SoC: Qualcomm® QCS610
  • CPU: Arm® Cortex®-A76 + Arm Cortex-A55

Operating System:

  • Linux

Compiler:

  • GNU Compiler Collection (GCC)

Summary of our results:

  • Speedup of core feature detection algorithm by 7× on Cortex-A76 and 5× on Cortex-A55.
All referenced product or service names and trademarks are the property of their respective owners.