This forum has been archived. All content is frozen. Please use KDE Discuss instead.

(Simplistic) Eigen Benchmarking on Ubuntu 10.04.

Tags: None
(comma "," separated)
kalakouentin
Registered Member
Posts
5
Karma
0
OS
Hello,

I am trying to build a very naive metric for the performance on linear algebra systems on my system.
The code I am using is here. I do not know if my results are sensible and I would like some comments and/or pointing-out of possible mistakes I am doing.
My 4-year old system (HP NW9440 - Intel T7400 @ 2.16GHz - 3GB RAM) is having the following configuration: Ubuntu 10.04.3 - 2.6.32-34-generic; GSL version : 1.13; MKL version : 10.3; Eigen version: - 2.0.12. MKL was installed manually by me and to my best of knowledge it works fine. GSL and Eigen were installed using Synaptic.
g++ is on version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) and icpc on 12.1.0 (gcc version 4.4.3 compatibility). GCC (and g++ on that matter) were installed by synaptic, icpc manually.
Admittedly the versions of GSL and Eigen are not in their latest editions; I wanted to keep the configuration as "vanilla" as possible.

I initialize two 2000-by-2000 random matrix and I compute their product to a third matrix. First using Eigen, then using "naive C-code" and then using BLAS.

Originally I compile my code using g++ under the arguments :
g++ -I /usr/include/eigen2/ my_program.cpp -o my_program.out -lgsl -lgslcblas -O3

The running times on my system are (in seconds) :

Eigen's Time: 7.780000
Naive's Time: 69.250000
GSLcblas' Time: 44.800000
(medians after 7 runs)

Then I compile my code using icpc under the arguments :
icpc -I /usr/include/eigen2/ my_program.cpp -o my_programI.out -lgsl -mkl=sequential -fast

Eigen's Time: 3.650000
Naive's Time: 14.510000
MKLcblas' Time: 2.240000
(medians after 7 runs)

Using the formula (2)*(N^3)*(1/execution_time)*(1/1000000) I calculate the number of megaflops performance output each algorithm has. The (theoretical) gigaflop peak performance each of my cores has is 8.64 (= 17.28/2) as listed here by Intel - I think at least) Raw numbers in Megaflops followed by percentage of utilization:

Eigen's Performance ( g++ / gslcblas) : 2056.55526 (23.8027%)
Naive-C's Performance ( g++ / gslcblas) : 231.04693 (2.67415%)
BLAS (by GSL) Performance ( g++ / gslcblas) : 357.14286(4.13360%)

Eigen's Performance (icpc / mkl) : 4383.56164 (50.73566%)
Naive-C's Performance ( icpc / mkl) : 1102.6878 (12.76259%)
BLAS (by MKL) Performance ( icpc/ mkl) : 7142.85714 (82.67195%)

Are my results sensible? (I know this is a very subjective question given that someone has to comment about the performance of a machine he doesn't have physical access to)
Is it normal for GSL BLAS to be so... bad? Am I somehow hindering it's performance?
Is it normal that Eigen scales so abruptly between two compilers?
Is it normal that Eigen2 (Yes I know it's an older version and MKL has a newer version) is so much lower than MKL? (I have seen the benchmark page but a) I have compiled them specifically for my systems
Is it normal that my plain C-code is so much slower when using g++?
Do I calculate my code performance "sensibly"?
If you spot any obvious mistake in the way I am conducting my M-M multiplication please tell me so I can re-run my benchmarks to get more sensible results. (I tried to be have an almost pure-C syntax.)

Thank you! ;D
mattd
Registered Member
Posts
28
Karma
0
For GCC, use the -march=native -mtune=native flags // more info (incl. -Ofast) in my posts here: viewtopic.php?f=74&t=96825&p=203599#p203599

If your target is 32-bit x86, don't forget to also include the appropriate SSE options (for the x86-64 compiler SSE is enabled by default).
According to Wiki -- http://en.wikipedia.org/wiki/T7400#.22M ... C_65_nm.29 -- your CPU supports up to and including SSSE3.
So, also use -mfpmath=sse -mssse3
kalakouentin
Registered Member
Posts
5
Karma
0
OS
Thank you for your suggestions, they were very helpful.
I should have used -march=native in the first place, I have used it excessively in the past but today I totally forgot it... My bad!

The flag : -march=native made a big difference on Eigen's performance but none on GSL's BLAS. Eigen's (g++ compiled) running time is now actually faster than MKL's (on icpc). Eigen's median execution time: 2.06s.

All the others flags (disappointingly) brought no additional performance gains (or individual ones when used on there own). The obvious exception was -mfpmath=sse -mssse3 which had the same positive effect as -march=native.

I am really impressed by Eigen's performance on this being faster than MKL. Bravo!
lowsfer
Registered Member
Posts
5
Karma
0
Seems your code contains an mistake. Your compared performance of Eigen::MatrixXf, which is single-precision float, with double-precision cblas function. This is not a fair test.

I have written a test similar to you but got opposite result, so I looked into you code and found this. After making it doulbe-precision, Eigen (I use Eigen v3 becuase it's faster than v2) is a little bit slower than mkl but very close. (Eigen: 1.75s; MKL cblas: 1.71s, on i7 820M, test done with single-thread and CPU frequency limited to the minimum of1.2GHz by $cpufreq-set to avoid inaccuracy induced by Intel turbo boost and frequency scaling). Though a little bit slower, it's impressive that a compiler optimised library could be almost as fast as hand-tuned library like MKL or GotoBlas! I didn't test ATLAS which is also compiler optimised but ATLAS should be slower than Eigen, since Eigen performance is so close to MKL.

I also tested Armadillo. The performance was almost the same as MKL cblas. This is reasonable because my Armadillo used MKL blas.

Another impressive result is that g++ outperformed Intel icpc here!
g++ -DNDEBUG -march=native -O3
Eigen: 1.75s
MKL (cblas or Armadillo): 1.71s
icpc -DNDEBUG -xHost -O3
Eigen: 2.13s
MKL (cblas or Armadillo): 1.74s
PGO didn't improve performance compared to above CXXFLAGS

Sorry to bump up an old thread but I just want to correct this error so other people will not be mislead if they are lead here by google.


Bookmarks



Who is online

Registered users: Baidu [Spider], Bing [Bot], Google [Bot]