
Meeting C++ 2025 – Speed for free – current state of auto-vectorizing compilers
7 November @ 17:15 – 18:15
For about a decade now, the locos of compute power in modern CPUs has shifted. The scalar execution units have far fallen behind the vector units (SIMD) within the same core. Failure to utilize SVE2 or AVX512 may leave a whole order of magnitude of performance on the table.
C++ has no built-in primitives to directly express SIMD operations. You may either use target-specific libraries and extensions or let the compiler figure it out automatically. If the latter works out, you get a speed-up for free without touching your source code. The latest GCC and Clang releases extended the capabilities of their auto-vectorizers and enabled them by default in higher optimization levels.
In this talk, we will take a look at the current state of these auto-vectorizers, check which code they
work well on and where they still struggle. We’ll compare the size and performance of the generated binary to that of hand-vectorized code using intrinsics. An estimate for the theoretical maximum possible performance of the target hardware will serve as a benchmark.