eigen3 openmp vs openmp/sse2 performance comparison • KDE Community Forums

This forum has been archived. All content is frozen. Please use KDE Discuss instead.

Board index

eigen3 openmp vs openmp/sse2 performance comparison

Page 1 of 1 (15 posts)

Tags:

isluser Registered Member Posts 10 Karma 0	eigen3 openmp vs openmp/sse2 performance comparison Wed Sep 15, 2010 3:15 pm Hi, I'd like to know what kind of performance gain I can expect by enabling sse2 with Visual Studio 2008 on the developpement branch of eigen 3? I'm currently using the test between n and z to do some benchmarking. For 15 itérations, I get 4 min 15 sec with only openmp. When I activate SSE2, I get 32 sec. I just seems too good to be true. As a more precise example, the product_symm test takes 55 secs with openmp and 5 secs with sse2. I was wondering if those number were plausibles, or is there something else going on? Thank you,
bjacob Registered Member Posts 658 Karma 3	Re: eigen3 openmp vs openmp/sse2 performance comparison Wed Sep 15, 2010 4:27 pm For 15 itérations, I get 4 min 15 sec with only openmp. When I activate SSE2, I get 32 sec. I just seems too good to be true. No it's normal. You get a total x8, decomposing into: x4 because 4 floats fit in a packet x2 because SSE addition and multiplication can run together in 1 cycle Join us on Eigen's IRC channel: #eigen on irc.freenode.net Have a serious interest in Eigen? Then join the mailing list!
renorm Registered Member Posts 31 Karma 0	Re: eigen3 openmp vs openmp/sse2 performance comparison Sun Sep 19, 2010 3:14 am x2 because SSE addition and multiplication can run together in 1 cycle I am very curious how it is possible? Is it in PacketMath.h?
bjacob Registered Member Posts 658 Karma 3	Re: eigen3 openmp vs openmp/sse2 performance comparison Sun Sep 19, 2010 8:15 pm renorm wrote: x2 because SSE addition and multiplication can run together in 1 cycle I am very curious how it is possible? Is it in PacketMath.h? No, it's in your CPU. Recent Intel CPUs (not sure about AMD) are able to execute in 1 cycle a (mulps,addps) pair, and likewise a (mulpd,addpd) pair, EVEN if operating on the same registers. It's as if there were an combined-mul-and-add instruction working in 1 cycle. Join us on Eigen's IRC channel: #eigen on irc.freenode.net Have a serious interest in Eigen? Then join the mailing list!
renorm Registered Member Posts 31 Karma 0	Re: eigen3 openmp vs openmp/sse2 performance comparison Sun Sep 19, 2010 10:10 pm How do you get it triggered? Follow _mm_mul_ps with _mm_add_ps?
bjacob Registered Member Posts 658 Karma 3	Re: eigen3 openmp vs openmp/sse2 performance comparison Sun Sep 19, 2010 11:01 pm Yes: see ei_pmadd() in GenericPacketMath.h: /** \internal \returns a * b + c (coeff-wise) */ template<typename Packet> inline Packet ei_pmadd(const Packet& a, const Packet& b, const Packet& c) { return ei_padd(ei_pmul(a, b),c); } Join us on Eigen's IRC channel: #eigen on irc.freenode.net Have a serious interest in Eigen? Then join the mailing list!
renorm Registered Member Posts 31 Karma 0	Re: eigen3 openmp vs openmp/sse2 performance comparison Sat Sep 25, 2010 2:56 am Thanks for explaining. The original question mentioned OpenMP. Does Eigen explicitly use it?
bjacob Registered Member Posts 658 Karma 3	Re: eigen3 openmp vs openmp/sse2 performance comparison Sat Sep 25, 2010 4:02 am Yes, Eigen 3 uses OpenMP if it's enabled. By default it's disabled, but for example with GCC you just have to pass -fopenmp. Join us on Eigen's IRC channel: #eigen on irc.freenode.net Have a serious interest in Eigen? Then join the mailing list!
red_cat Registered Member Posts 5 Karma 0	Re: eigen3 openmp vs openmp/sse2 performance comparison Mon Mar 21, 2011 12:37 pm Hi! I use a function of matrix-vector multiplication. This function uses only one core? When multiplying two matrices are used all the cores. System Configuration: Intel I7-970, Windows 7. Compiler VS2008, OpenMP enabled.
ggael Moderator Posts 3447 Karma 19 OS	Re: eigen3 openmp vs openmp/sse2 performance comparison Mon Mar 21, 2011 11:30 pm currently only matrix * matrix products is multi-threaded, not matrix * vector
red_cat Registered Member Posts 5 Karma 0	Re: eigen3 openmp vs openmp/sse2 performance comparison Tue Mar 22, 2011 9:18 am Will there be implemented multithreading for matrix-vector multiplication?
ggael Moderator Posts 3447 Karma 19 OS	Re: eigen3 openmp vs openmp/sse2 performance comparison Tue Mar 22, 2011 10:00 am sure, but I cannot say when...
red_cat Registered Member Posts 5 Karma 0	Re: eigen3 openmp vs openmp/sse2 performance comparison Wed Jul 20, 2011 5:43 am If enable OpenMP, then LU decomposition is slower. Why?
ggael Moderator Posts 3447 Karma 19 OS	Re: eigen3 openmp vs openmp/sse2 performance comparison Wed Jul 20, 2011 8:39 am strange because I observed the opposite. Which matrix size? Do you have multi-threading enabled? if so call your executable with: $ OMP_NUM_THREADS=number_of_real_cores ./my_app
red_cat Registered Member Posts 5 Karma 0	Re: eigen3 openmp vs openmp/sse2 performance comparison Wed Jul 27, 2011 5:26 am thanks, it helped

Page 1 of 1 (15 posts)

Bookmarks

Who is online

Registered users: bartoloni, Bing [Bot], Google [Bot], Yahoo [Bot]