simd - Horizontal add with __m512 (AVX512) -


how 1 efficiently perform horizontal addition floats in 512-bit avx register (ie add items single vector together)? 128 , 256 bit registers can done using _mm_hadd_ps , _mm256_hadd_ps there no _mm512_hadd_ps. intel intrinsics guide documents _mm512_reduce_add_ps. doesn't correspond single instruction existence suggests there optimal method, doesn't appear defined in header files come latest snapshot of gcc , can't find definition google.

i figure "hadd" can emulated _mm512_shuffle_ps , _mm512_add_ps or use _mm512_extractf32x4_ps break 512-bit register 4 128-bit registers want make sure i'm not missing better.

the intel compiler has following intrinsic defined horizontal sums

_mm512_reduce_add_ps     //horizontal sum of 16 floats _mm512_reduce_add_pd     //horizontal sum of 8 doubles _mm512_reduce_add_epi32  //horizontal sum of 16 32-bit integers _mm512_reduce_add_epi64  //horizontal sum of 8 64-bit integers 

however, far can tell these broken multiple instructions anyway don't think gain more doing horizontal sum of upper , lower part of avx512 register.

__m256 low  = _mm512_castps512_ps256(zmm); __m256 high = _mm256_castpd_ps(_mm512_extractf64x4_pd(_mm512_castps_pd(zmm),1));  __m256d low  = _mm512_castpd512_pd256(zmm); __m256d high = _mm512_extractf64x4_pd(zmm,1);  __m256i low  = _mm512_castsi512_si256(zmm); __m256i high = _mm512_extracti64x4_epi64(zmm,1); 

to horizontal sum sum = horizontal_add(low + high).

static inline float horizontal_add (__m256 a) {     __m256 t1 = _mm256_hadd_ps(a,a);     __m256 t2 = _mm256_hadd_ps(t1,t1);     __m128 t3 = _mm256_extractf128_ps(t2,1);     __m128 t4 = _mm_add_ss(_mm256_castps256_ps128(t2),t3);     return _mm_cvtss_f32(t4);         }  static inline double horizontal_add (__m256d a) {     __m256d t1 = _mm256_hadd_pd(a,a);     __m128d t2 = _mm256_extractf128_pd(t1,1);     __m128d t3 = _mm_add_sd(_mm256_castpd256_pd128(t1),t2);     return _mm_cvtsd_f64(t3);         } 

i got information , functions agner fog's vector class library , intel instrinsics guide online.


Comments

Popular posts from this blog

c++ - QTextObjectInterface with Qml TextEdit (QQuickTextEdit) -

javascript - angular ng-required radio button not toggling required off in firefox 33, OK in chrome -

xcode - Swift Playground - Files are not readable -