AVX/Vectorization Performance with 3.3 beta-1 • KDE Community Forums

This forum has been archived. All content is frozen. Please use KDE Discuss instead.

Board index

AVX/Vectorization Performance with 3.3 beta-1

Page 1 of 1 (9 posts)

Tags:

renes Registered Member Posts 5 Karma 0	AVX/Vectorization Performance with 3.3 beta-1 Mon Apr 11, 2016 3:46 pm I'm a long time Eigen 3 user and recently updated to 3.3 beta-1 to give it a try. I benchmarked Eigen performance with/without vectorization mostly to verify that vectorization was working but the results surprised me, especially given the AVX comments on the 3.3 page. The results I got suggest that vectorization is disabled or not used for Matrix4f and Vector4f types when AVX is enabled. Enabling AVX clearly improves double float performance (Matrix4dMatrix4d) but overall performance is better without AVX. I verified with SimdInstructionSetsInUse() that the expected instruction sets were enabled. Also, interestingly -- but perhaps expected? -- with AVX enabled Matrix4f and Matrix4d are 32 byte aligned (checking with alignof()) but only 16 byte aligned without AVX; Vector4f is always 16 byte aligned. Below are the numbers I got. The timings are the average time per operation over 100M iterations. I'm compiling with VS2015 (I plan to try this on Linux later). EIGEN_DONT_VECTORIZE defined: Matrix4fVector4f: 49ns Matrix4fMatrix4f: 235ns Matrix4dMatrix4d: 238ns With /arch:AVX: Matrix4fVector4f: 46ns -- About the same as EIGEN_DONT_VECTORIZE Matrix4fMatrix4f: 252ns -- About the same as EIGEN_DONT_VECTORIZE Matrix4dMatrix4d: 11ns Without /arch:AVX: Matrix4fVector4f: 2ns Matrix4fMatrix4f: 7ns Matrix4dMatrix4d: 14ns -- Almost 30% slower than AVX version
ggael Moderator Posts 3447 Karma 19 OS	Re: AVX/Vectorization Performance with 3.3 beta-1 Mon Apr 11, 2016 4:26 pm Be careful with such micro-benchmark as the compiler might over-optimize some versions, or maybe it just messing wit inlining. For instance, a factor x25 for the matrix-vector cases does not make any sense. You might also try with the devel branch as the beta1 is already quite old. Your observations regarding alignment are right and on purpose: sizeof(Vector4f)==16, so no need for 32 bytes alignment (that would also waste memory).
renes Registered Member Posts 5 Karma 0	Re: AVX/Vectorization Performance with 3.3 beta-1 Mon Apr 11, 2016 6:51 pm Thanks for the reply. I modified my benchmark test to use a realistic workload from one of my applications in order to (hopefully!) prevent the compiler from optimizing away the tests. Plus I downloaded the latest Eigen devel version. Now, the performance difference between EIGEN_DONT_VECTORIZE and the SSE/SSE2 vectorized versions is 5x-10x instead of 25x. Roughly speaking, what would be the expected improvement? However, the basic problem remains -- it seems that with /march:AVX the single float tests produce the same results as DONT_VECTORIZE but the double float tests show a ~30% improvement over the SSE/SSE2 version. I guess I'll look at the ASM output to see what the compiler is doing and I'll test this on Linux in a few days to see if there's any difference.
ggael Moderator Posts 3447 Karma 19 OS	Re: AVX/Vectorization Performance with 3.3 beta-1 Mon Apr 11, 2016 9:28 pm ok, looks like there is indeed an issue for fixed-size products for which half-register instructions are not considered.
ggael Moderator Posts 3447 Karma 19 OS	Re: AVX/Vectorization Performance with 3.3 beta-1 Wed Apr 13, 2016 9:56 am Fixed: https://bitbucket.org/eigen/eigen/commits/31f783860864/ Summary: Enable the use of half-packet in coeff-based product. For instance, Matrix4f*Vector4f is now vectorized again when using AVX
renes Registered Member Posts 5 Karma 0	Re: AVX/Vectorization Performance with 3.3 beta-1 Wed Apr 13, 2016 4:40 pm Thanks! I'll update today and test it out.
renes Registered Member Posts 5 Karma 0	Re: AVX/Vectorization Performance with 3.3 beta-1 Thu Apr 14, 2016 3:00 am ggael wrote:Fixed: https://bitbucket.org/eigen/eigen/commits/31f783860864/ Summary: Enable the use of half-packet in coeff-based product. For instance, Matrix4fVector4f is now vectorized again when using AVX I tested this out and, indeed, Matrix4fVector4f performs much better now with AVX enabled. Matrix4fVector4f with AVX is now about the same as it is with SSE/SSE2 only. Is that the expected result? However, when AVX is enabled Matrix4fMatrix4f still seems to perform the same as if EIGEN_DONT_VECTORIZE was defined. The SSE/SSE2 only version of Matrix4f*Matrix4f is ~9x faster.
ggael Moderator Posts 3447 Karma 19 OS	Re: AVX/Vectorization Performance with 3.3 beta-1 Fri Apr 15, 2016 11:19 am Make sure you have updated your clone. I fixed that case 2 days ago: https://bitbucket.org/eigen/eigen/commi ... d0dd05b906
renes Registered Member Posts 5 Karma 0	Re: AVX/Vectorization Performance with 3.3 beta-1 Mon Apr 18, 2016 12:58 pm Yep, that worked. I updated this morning to the latest revision -- af907dececc0 -- and now the SSE/SSE2 and AVX versions of my tests all run as expected. Thanks for your help!

Page 1 of 1 (9 posts)

Bookmarks

Who is online

Registered users: Bing [Bot], Evergrowing, Google [Bot], rockscient