Registered Member
|
Hello,
I have been working with computer matrix multiplications on last months and I have made some tests using openMP and eigen3. The tests were made on the follow machines: Computer 1: Intel Core i7-3610QM CPU @ 2,30GHz / 6 GB ddr3 Computer 2: Six-Core AMD Opteron(tm) Processor 2435 2.60 GHz (2 processors) / 16 GB For the openMP the follow matrix-matrix multiplication algorithm was used:
The results were the follow: __________________Computer 1________Computer 2 Sequential_______232,75600_________536,21400 OpenMP_________2,75764____________7,62024 Eigen3___________3,35090____________1,92970 The time is in seconds. The matrix sizes were 2700 x 2500 and 2500 x 2700. As you can see the OpenMP version is faster on the first computer (i7) than the eigen3 version. However for the computer 2 (2x Opteron) the performance of eigen3 complete beats the OpenMP version plus all the tests made in the computer 1. SSE2 instructions were activated for the eigen3 tests. Any idea why I get this results and why eigen3 isn't so fast in the computer 1 as in computer 2? Regards, Fábio Bento
Last edited by fbento on Sun Dec 09, 2012 8:23 pm, edited 1 time in total.
|
Registered Member
|
which compiler did you use? did you try it with -O3?
|
Registered Member
|
Hi, I have used visual studio 2012 with plataform toolset of vs2010.
I've haven't found -O3 option in it, but I've used /O2. |
Moderator
|
I don't understand your results, you get a speedup factor around 100 between the sequential and OpenMP version, do you have so much cores?? Are you sure you are computing the right things?
Also the theoretical peak perf. for a single core is 9.2 GFLOPS for the 1st and 10.4 GFLOPS for the second. If I convert your timings to GFLOPS, I get: OpenMP_________13.25____________4.79 Eigen3___________10.88____________19.18 The results for Eigen on the second computer is what is expected 19.18 is close to the max (20.. On the other hand the results on the i7 are rather strange. You should disable the turboboost when benchmarking, make sure that the number of threads that are used is not higher than the actual number of cores (hyper-threading), etc. |
Registered Member
|
This seem to be the cause of it (threads > num physical processors). I'm making some more tests, I will reply with a more detailed answer later. |
Registered Member
|
Thanks for your answers.
The huge difference between the sequential and the parallel versions is due to be different algorithms being used. The sequential version uses the usual naïve O(N^3) without any optimizations, whilst the parallel versions are optimized versions – using blocks. Using the same algorithm the sequential version times are about 10 (computer 1) and 50 (for computer 2) – sorry should have put these values in the first post. The difference between Eigen3 performance vs OpenMP performance in the first and second computer seems to be due the number of threads launched vs the number of physical processors available. We found that the performance of Eigen3 gets worse if the number of threads launched is bigger than the available number of physical processors and this is not the case for OpenMP In the tests the number of threads launched for both cases was equal to the number total processors (virtual + physical). In computer 1 the Eigen3 performance is worse because the number of total processors (virtual + physical - – due to hyperthreading) is greater than the number of physical processors. In computer 2 the Eigen3 performance is better because the total number of processors is the same as the number of the physical processors. If we use the double of number of physical processors for the number of threads the performance of Eigen3 also degrades and the openMP in fact improves a little. |
Registered users: Bing [Bot], Google [Bot], Sogou [Bot], Yahoo [Bot]