This forum has been archived. All content is frozen. Please use KDE Discuss instead.

Performance question

Tags: None
(comma "," separated)
pollyp
Registered Member
Posts
1
Karma
0

Performance question

Thu Sep 30, 2010 8:39 pm
Hi,

I had some Matlab code that I was trying to speed up, so I rewrote it in C++/Eigen. To my surprise, it's slower ... a lot slower. :(

Here's a test program I wrote, using Eigen 2.0.15. All it does is matrix multiplication two hundred times:

Code: Select all
#include <sys/time.h>
#include <stdio.h>
#include <map>
#include <string>
#include <iostream>
#include <iomanip>
#include <math.h>
#include "Eigen/Core"
#include "Eigen/Eigen"

USING_PART_OF_NAMESPACE_EIGEN
using namespace std;

struct timeval begin_detail_time, end_detail_time;
int myTest( MatrixXd A, MatrixXd B, MatrixXd C[200] );
void print_time_diff( char *label , struct timeval begin, struct timeval end );

int main( int argc, char ** argv )
{

        MatrixXd A(200,2500), B(2500,1);
        MatrixXd ret[200];
        A.fill(.789);
        B.fill(.789);
#ifdef EIGEN_VECTORIZE
        fprintf(stderr,"eigen vectorize is ENABLED\n");
#else
        fprintf(stderr,"eigen vectorize is DISABLED\n");
#endif

        gettimeofday(&begin_detail_time,NULL);
        myTest( A, B, ret );
        gettimeofday(&end_detail_time,NULL);
        print_time_diff( "main: ", begin_detail_time, end_detail_time );

}

int myTest( MatrixXd A, MatrixXd B, MatrixXd C[200] )
{
        for ( int i = 0 ; i < 200 ; i++ )
        {
                C[i] = A * B;
        }
}

void print_time_diff( char *label , struct timeval begin, struct timeval end )
{
        double t1 = begin.tv_sec+(begin.tv_usec/1000000.0);
        double t2 = end.tv_sec+(end.tv_usec/1000000.0);
        printf("%s: elapsed time: %.6lf secs\n", label, t2-t1 );
}


On my Ubuntu 9.10/duo Xeon 5130 2Ghz system that supports SSE2, it takes around 1.18 seconds to run:

Code: Select all
./a.out
eigen vectorize is ENABLED
main: : elapsed time: 1.184492 secs


Here's how I compiled it:

Code: Select all
g++ -msse2 -O2 -I ../eigen/eigen -L/usr/lib/sse2 test6.cc


And here's the version information for g++:

Code: Select all
 g++ --version
g++ (Ubuntu 4.4.1-4ubuntu9) 4.4.1


If I run this code in r2010a Matlab (using the -singlethread option to keep the comparison fair):

Code: Select all
disp( [ 'using this number of threads: ', num2str(maxNumCompThreads) ] ) ;
A  = ones( 200, 2500 );
B  = ones( 2500, 1 );
A = A .* .789;
B = B .* .789;
C = zeros( 200, 1, 200 );


tic
for i = 1:200
        C(:,:,i) = A * B;
end
rtime = toc;
disp( [ 'time = ' , num2str(rtime) ] );


I get this:

Code: Select all
using this number of threads: 1
time = 0.13801


I'm stumped! If I'm doing something wrong here, I don't know what it is. What am I missing? What can I try to track down the problem?

Thanks in advance,

Polly
MarkusS
Registered Member
Posts
9
Karma
0
OS

Re: Performance question

Wed Oct 06, 2010 4:53 pm
Something must be wrong with your compiler switches. On my macbock (2.4 GHz core2duo) your c++ code takes 0.28 sec to run:

test$ g++ -Wall -O2 -I/usr/local/include/eigen3 eigenBench.cpp -o eigenBench
eigenBench.cpp: In function ‘int main(int, char**)’:
eigenBench.cpp:55: warning: deprecated conversion from string constant to ‘char*’
test$ ./eigenBench
eigen vectorize is ENABLED
main: : elapsed time: 0.282082 secs

adding -msse2 or -msse3 doesn't change the game at all:

test$ g++ -Wall -O2 -msse2 -I/usr/local/include/eigen3 eigenBench.cpp -o eigenBench
eigenBench.cpp: In function ‘int main(int, char**)’:
eigenBench.cpp:55: warning: deprecated conversion from string constant to ‘char*’
test$ ./eigenBench
eigen vectorize is ENABLED
main: : elapsed time: 0.281478 secs
mocart03:test$ g++ -Wall -O2 -msse3 -I/usr/local/include/eigen3 eigenBench.cpp -o eigenBench
eigenBench.cpp: In function ‘int main(int, char**)’:
eigenBench.cpp:55: warning: deprecated conversion from string constant to ‘char*’
test$ ./eigenBench
eigen vectorize is ENABLED
main: : elapsed time: 0.283794 secs


g++ version:

test$ g++ -v
Using built-in specs.
Target: i686-apple-darwin10
Configured with: /var/tmp/gcc/gcc-5664~38/src/configure --disable-checking --enable-werror --prefix=/usr --mandir=/share/man --enable-languages=c,objc,c++,obj-c++ --program-transform-name=/^[cg][^.-]*$/s/$/-4.2/ --with-slibdir=/usr/lib --build=i686-apple-darwin10 --program-prefix=i686-apple-darwin10- --host=x86_64-apple-darwin10 --target=i686-apple-darwin10 --with-gxx-include-dir=/include/c++/4.2.1
Thread model: posix
gcc version 4.2.1 (Apple Inc. build 5664)
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS

Re: Performance question

Thu Oct 07, 2010 5:40 pm
move to Eigen3 with is 3 time faster on that example ?

Now on that example Eigen2 and 3 should be equally fast and at least as fast than MatLab since we have the fastest matrix-vector products ... ;)
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS

Re: Performance question

Thu Oct 07, 2010 6:17 pm
oops, actually the problem is that you are using MatrixXd to represent vectors while you should use VectorXd. After this change this much faster (0.12s on my 2GHz core2), and you can even mark that your product does not alias:

Eigen2 : C = (A*B).lazy();

Eigen3 : C.noalias() = A * B;

(=> 0.11s)


Bookmarks



Who is online

Registered users: Bing [Bot], blue_bullet, Google [Bot], rockscient, Yahoo [Bot]