Registered Member
|
Hi,
I'd like to know what kind of performance gain I can expect by enabling sse2 with Visual Studio 2008 on the developpement branch of eigen 3? I'm currently using the test between n and z to do some benchmarking. For 15 itérations, I get 4 min 15 sec with only openmp. When I activate SSE2, I get 32 sec. I just seems too good to be true. As a more precise example, the product_symm test takes 55 secs with openmp and 5 secs with sse2. I was wondering if those number were plausibles, or is there something else going on? Thank you, |
Registered Member
|
No it's normal. You get a total x8, decomposing into: x4 because 4 floats fit in a packet x2 because SSE addition and multiplication can run together in 1 cycle
Join us on Eigen's IRC channel: #eigen on irc.freenode.net
Have a serious interest in Eigen? Then join the mailing list! |
Registered Member
|
I am very curious how it is possible? Is it in PacketMath.h? |
Registered Member
|
No, it's in your CPU. Recent Intel CPUs (not sure about AMD) are able to execute in 1 cycle a (mulps,addps) pair, and likewise a (mulpd,addpd) pair, EVEN if operating on the same registers. It's as if there were an combined-mul-and-add instruction working in 1 cycle.
Join us on Eigen's IRC channel: #eigen on irc.freenode.net
Have a serious interest in Eigen? Then join the mailing list! |
Registered Member
|
How do you get it triggered? Follow _mm_mul_ps with _mm_add_ps?
|
Registered Member
|
Yes: see ei_pmadd() in GenericPacketMath.h:
/** \internal \returns a * b + c (coeff-wise) */ template<typename Packet> inline Packet ei_pmadd(const Packet& a, const Packet& b, const Packet& c) { return ei_padd(ei_pmul(a, b),c); }
Join us on Eigen's IRC channel: #eigen on irc.freenode.net
Have a serious interest in Eigen? Then join the mailing list! |
Registered Member
|
Thanks for explaining.
The original question mentioned OpenMP. Does Eigen explicitly use it? |
Registered Member
|
Yes, Eigen 3 uses OpenMP if it's enabled. By default it's disabled, but for example with GCC you just have to pass -fopenmp.
Join us on Eigen's IRC channel: #eigen on irc.freenode.net
Have a serious interest in Eigen? Then join the mailing list! |
Registered Member
|
Hi!
I use a function of matrix-vector multiplication. This function uses only one core? When multiplying two matrices are used all the cores. System Configuration: Intel I7-970, Windows 7. Compiler VS2008, OpenMP enabled. |
Moderator
|
currently only matrix * matrix products is multi-threaded, not matrix * vector
|
Registered Member
|
Will there be implemented multithreading for matrix-vector multiplication?
|
Moderator
|
sure, but I cannot say when...
|
Registered Member
|
If enable OpenMP, then LU decomposition is slower.
Why? |
Moderator
|
strange because I observed the opposite. Which matrix size? Do you have multi-threading enabled? if so call your executable with:
$ OMP_NUM_THREADS=number_of_real_cores ./my_app |
Registered Member
|
Registered users: bartoloni, Bing [Bot], Google [Bot], Yahoo [Bot]