This forum has been archived. All content is frozen. Please use KDE Discuss instead.

performance of eigen

Tags: None
(comma "," separated)
PNSH
Registered Member
Posts
2
Karma
0

performance of eigen

Sun Dec 13, 2009 5:43 am
According to benchmark, eigen is much better than atlas on M * M. However, I tested on some PCs (core duo/core2 quad), eigen 2.0.10/atlas 3.8, single thread.

The results showed M(1500,1500)*N(1500,1500) cost more or less the time on eigen and atlas. Actually, atlas was a little faster.

I also compared matrix element accessing of ublas and eigen dynamic matrices. They are almost the same.

Did I missing any optimizations?

Here are my sample codes and Makefile
Code: Select all
#include <boost/numeric/bindings/traits/ublas_vector.hpp>
#include <boost/numeric/bindings/traits/ublas_sparse.hpp>
#include <boost/numeric/bindings/umfpack/umfpack.hpp>
#include <boost/numeric/ublas/matrix_sparse.hpp>
#include <boost/numeric/ublas/matrix.hpp>
#include <boost/numeric/ublas/io.hpp>
#include <eigen2/Eigen/Core>
#include <eigen2/Eigen/Array>

#include <iostream>
#include <fstream>
#include <ctime>
#include <cstdlib>
namespace ublas = boost::numeric::ublas;
namespace umf = boost::numeric::bindings::umfpack;

using namespace boost::numeric::ublas ;

using std::cout;
using std::endl;
using namespace std;
USING_PART_OF_NAMESPACE_EIGEN


#define N 1000

int main()
{
    ublas::matrix<double, ublas::column_major, ublas::unbounded_array<double> > bm (N,N);
    MatrixXd em(N,N);
    double startt, endt;   
    startt = (double) clock() / CLOCKS_PER_SEC;
    for (int p = 0; p < 100; p++){
    for (int i = 0; i < N; i++){
   for (int j = 0; j < N; j++){
       bm(i, j) = i+j;
   }
    }
    }
    endt = (double) clock() / CLOCKS_PER_SEC;
    cout << (endt - startt)/2 << endl;
    startt = (double) clock() / CLOCKS_PER_SEC;
    for (int p = 0; p < 100; p++){
   for (int i = 0; i < N; i++){
       for (int j = 0; j < N; j++){
      em(i, j) = i+j;
       }
   } 
    }
    endt = (double) clock() / CLOCKS_PER_SEC;
    cout << endt - startt << endl;

}

Code: Select all
#include <eigen2/Eigen/Core>
#include <eigen2/Eigen/Array>

#include <iostream>
#include <fstream>
#include <ctime>
#include <cstdlib>

#ifdef __cplusplus
extern "C" {
#endif

#include <cblas.h>

#ifdef __cplusplus
}
#endif


// import most common Eigen types
using namespace std;
USING_PART_OF_NAMESPACE_EIGEN

#define N 1500

int main(int, char *[])
{
    MatrixXd m4(N,N);
    MatrixXd m5(N,N);
    m4.setRandom();
    m5.setRandom();
    MatrixXd m6(N,N);
    double startt, endt;   
    startt = (double) clock() / CLOCKS_PER_SEC;
    cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, N, N, N, 1.0f, m4.data(), N, m5.data(), N, 0.0f, m6.data(), N);
    endt = (double) clock() / CLOCKS_PER_SEC;
    cout << (endt - startt)/2 << endl;
    startt = (double) clock() / CLOCKS_PER_SEC;
    m6 = m4 * m5;
    endt = (double) clock() / CLOCKS_PER_SEC;
    cout << endt - startt << endl;
}


Code: Select all
DEFINC = /usr/include/

LIBS   = -L /opt/mpich2/lib/
INC   = -I $(DEFINC)geodesic -I $(DEFINC)OGRE -I $(DEFINC)CEGUI -I $(DEFINC)OGRE -I $(DEFINC)CEGUI -I $(DEFINC)libxml2 -I $(DEFINC)rlog -I $(DEFINC)log4cpp -I /opt/mpich2/include/

LINK   = -larpack -ldmumps -lmumps_common  -lscalapack -llapack -lblacs -lblacsf77 -lblacs -lmpich -lptf77blas -lptcblas_atlas -latlas -lumfpack -lamd -lsuperlu -lOpenMeshCore -lOpenMeshTools -lOgreMain -lCEGUIBase -lCEGUIOgreRenderer -lOIS -lANN -lxml2 -liniparser -lrlog -llog4cxx -lcv -lcvaux -lhighgui -lml -pthread  -lmetis -lpord -lesmumps -lfax_scotch -lsymbol -ldof -lorder -lgraph_scotch -lscotch -lscotcherr -lcommon
CC   = mpicc
CXX   = mpic++
CXXFLAGS   = -O3 #= -DRLOG_COMPONENT
CXXFLAGS   +=  -msse2
CXXFLAGS   += -DEIGEN_VECTORIZE
CXXFLAGS   += -DEIGEN_NO_DEBUG
CFLAGS   = $(CXXFLAGS)

OBJ   = test.o
SRC   = test.cpp


all : test

test : $(OBJ)
   $(CXX) $< -o $@ $(LIBS) $(LINK)

#obj : $(SRC)
#   $(CXX) $(INC) $(CXXFLAGS) -c $<

.cpp.o :
   $(CXX) $(INC) $(CXXFLAGS) -c $<

#all : $(OBJ)
#   $(CXX) $(LIBS) $(LINK) $< -o test

clean :
   rm -rf ./*.o test

User avatar
bjacob
Registered Member
Posts
658
Karma
3

Re: performance of eigen

Sun Dec 13, 2009 6:39 am
According to our benchmark (matrix-matrix product) we're 20% faster. If you only get roughly the same speedm that could be:
* either because your ATLAS is better tuned for your CPU cache size than Eigen is by default. With EIGEN, you can currently control that by #defines, grep for CACHE in Core/util/Macros.h
* or that could be a difference between your setup and Gael's when he made the benchmark. That benchmark was on x86-64, with GCC 4.4.


Join us on Eigen's IRC channel: #eigen on irc.freenode.net
Have a serious interest in Eigen? Then join the mailing list!
PNSH
Registered Member
Posts
2
Karma
0

Re: performance of eigen

Sun Dec 13, 2009 3:25 pm
Thanks for your expiation.

Following the suggestion, I enlarged CACHE macro, but it did not help. I did experiment on both i686 and x86_64 with gcc 4.4.

Perhaps, the reason is that atlas was better tuned and fitted my machines better.
User avatar
bjacob
Registered Member
Posts
658
Karma
3

Re: performance of eigen

Sun Dec 13, 2009 6:12 pm
Also, the benchmark was with the development branch, which now is much faster than 2.0, and already at that time (March 2009) was a bit faster than 2.0.

The CACHE macro is currently named: EIGEN_TUNE_FOR_CPU_CACHE_SIZE. Its default value is 4*256*256.


Join us on Eigen's IRC channel: #eigen on irc.freenode.net
Have a serious interest in Eigen? Then join the mailing list!
Seb
Registered Member
Posts
99
Karma
0

Re: performance of eigen

Wed Jan 13, 2010 8:42 am
Regarding the elementwise access you should take a look at the Eigen functions coeff and coeffRef - they are not error-guided (asserts etc.) and faster.
gaga666
Registered Member
Posts
4
Karma
0
OS

Re: performance of eigen

Tue Jan 19, 2010 5:22 pm
Btw, what about performance in comparison with MTL library? I can't decide which one to use in performance-critical application with small(<20x20) matrix computation -)
User avatar
bjacob
Registered Member
Posts
658
Karma
3

Re: performance of eigen

Tue Jan 19, 2010 5:33 pm
MTL4 and other libraries where among our old benchmark:
http://eigen.tuxfamily.org/index.php?ti ... August2008
Later benchmark only show the high-performance libraries.

But this benchmark tests dynamic-sizes only. If your sizes are not only small but known at compile time, a different benchmark is needed. Eigen does very well there too as this is one of our primary areas of interest, but then I don't know about MTL.


Join us on Eigen's IRC channel: #eigen on irc.freenode.net
Have a serious interest in Eigen? Then join the mailing list!
gaga666
Registered Member
Posts
4
Karma
0
OS

Re: performance of eigen

Tue Jan 19, 2010 6:07 pm
bjacob, thank you for your reply. And is it possible(i'm really sorry, but i haven't read documentation yet;) to use statically-allocated big enough matrix and use its sub-matrix in computations without reallocation and copying? That would'be been a big performance goal, and, what is even more important, possible to use in real-time applications.
e.g. someting like (saw such syntax somewhere):
Code: Select all
Matrix A(20,20), B(5,1);
Matrix C = map(A,0,0,5,5);
Matrix D = C*B;
User avatar
bjacob
Registered Member
Posts
658
Karma
3

Re: performance of eigen

Tue Jan 19, 2010 6:17 pm
Yes, we have a Map mechanism just as you describe, see class Map. But in your case we have something even better, if you just dont know the exact size of a matrix at compile time but know that it will never be bigger than 20x20 and want to avoid dynamic memory allocation, you could do:
Code: Select all
using namespace Eigen;
// start with size 5x5 inside of a statically allocated 20x20 array
Matrix<float,Dynamic,Dynamic,0,20,20> my_matrix(5,5);
// now resize for fun
my_matrix.resize(12,13);


Join us on Eigen's IRC channel: #eigen on irc.freenode.net
Have a serious interest in Eigen? Then join the mailing list!
gaga666
Registered Member
Posts
4
Karma
0
OS

Re: performance of eigen

Tue Jan 19, 2010 6:22 pm
Wow, that's really great! This feature is very important, because dynamic memory allocation at runtime is unacceptable in real-time systems. Thank you very much, bjacob, for your replies and for great job.
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS

Re: performance of eigen

Tue Jan 19, 2010 9:27 pm
regarding Eigen vs MTL4, for your use case, Eigen has a significant advantage which is explicit vectorization.
gaga666
Registered Member
Posts
4
Karma
0
OS

Re: performance of eigen

Wed Jan 20, 2010 12:48 pm
ggael, that's not an advantage in my case, because i'm writing fo i486@120Mhz using gcc 2.95 ;). But I can't compile library yet, some errors always appear. Anyway, I'm still trying, because eigen looks really great and perfectly suits my purpose.
User avatar
bjacob
Registered Member
Posts
658
Karma
3

Re: performance of eigen

Wed Jan 20, 2010 3:15 pm
We don't claim to support gcc 2.95.

The oldest gcc that we support is gcc 3.3 in Eigen 2.0. It might become gcc 3.4 in the default branch, if gcc 3.3 poses too many problems.

Notice that with such old GCC, even if you could compile, the resulting code would still be of very poor quality ---> slow and bloated. If you want high performance, use GCC >= 4.2. It is perfectly able to generate code for your 486. You don't have to run it _on_ the 486.


Join us on Eigen's IRC channel: #eigen on irc.freenode.net
Have a serious interest in Eigen? Then join the mailing list!


Bookmarks



Who is online

Registered users: abc72656, Bing [Bot], daret, Google [Bot], Sogou [Bot], Yahoo [Bot]