This forum has been archived. All content is frozen. Please use KDE Discuss instead.

Slow operator*() in symmetric sparse matrices

Tags: None
(comma "," separated)
Seb
Registered Member
Posts
99
Karma
0
Hi,

I am using 3.1.1 and find out that the product of a sparse matrix with dense vectors is considerably slower when using a self-adjoint triangular form compared with a full sparse matrix.

test code (pseudo code):
Code: Select all
    SparseMatrix A2 = ... some function
    SparseMatrix A(A2.triangularView<Lower>);
    MatrixXd rhs = MatrixXd::Random(A.rows(),1);
    unsigned int cycles = 100;
    t1 = boost::posix_time::microsec_clock::universal_time();
    for (unsigned int i(0); i!=cycles; ++i) rhs = (A.selfadjointView<Eigen::Lower>()*rhs).eval();
    t2 = boost::posix_time::microsec_clock::universal_time();
    std::cout << ( ("Time of symmetric product: " << (t2-t1).total_milliseconds() << "ms." )) << std::endl;
    t1 = boost::posix_time::microsec_clock::universal_time();
    for (unsigned int i(0); i!=cycles; ++i) rhs = (A2*rhs).eval();
    t2 = boost::posix_time::microsec_clock::universal_time();
    std::cout << ( ("Time of standard  product: " << (t2-t1).total_milliseconds() << "ms." )) << std::endl;

The matrix A is the connectivity matrix of a graph...

output:
Dimensions: (symmetric): (439476*439476), nonzeros: 2160889, storage: 0.00111882%.
Dimensions: (full ): (439476*439476), nonzeros: 3882302, storage: 0.00201011%.
Time of symmetric product: 13031ms.
Time of standard product: 1965ms.

Nearly a factor 7x slower....

Can anyone confirm this result or even has some ideas on how to circumvent this issue?
I expect to be faster when using the lower triangular matrix (less iterations over non-zero elements) It be at least as fast as the 'full' version (if not faster).

Best regards
Sebastian
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
hm, the last time I benched these operations I got similar performances. Are you compiling with optimization enabled?

also, what's your compiler? maybe it's just a function which is not properly inlined.
Seb
Registered Member
Posts
99
Karma
0
I use gcc 4.2.1 on Mac OS Lion:
- gcc version 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.11.00)

The parameters of g++ are:
-Wall -Wextra -Werror -Wno-long-long -std=c++98
-O2
-arch x86_64 -Xarch_x86_64 -mmacosx-version-min=10.5
-DQT_NO_DEBUG -DQT_SHARED

Best regards
Sebastian
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
ok, I can indeed reproduce a x10 difference in speed, I'll investigate
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
oh, with a more recent version of gcc, there is no slowdown, so yes that's probably just an inlining issue
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
the workaround is to disable assertion with -DNDEBUG
Seb
Registered Member
Posts
99
Karma
0
Ah. Thanks for investigating this. The solution with -D NDEBUG should perfectly work for me. I will check this tomorrow.

PS: Gcc 4.2.1 on Mac s really a pity!
Seb
Registered Member
Posts
99
Karma
0
Ok. It works perfectly now.

One comment, though. Using the compiler configuration noted above, I get a compile error in Eigen when using -DNDEBUG

I changed the line SparseSelfAdjointView.h:210
Code: Select all
    template<typename Dest> void scaleAndAddTo(Dest& dest, Scalar alpha) const


to
Code: Select all
    template<typename Dest> void scaleAndAddTo(Dest& dest, Scalar
                                           #ifndef NDEBUG
                                               alpha
                                           #endif
                                               ) const

to compile.

Thank you very much!
Sebastian
Seb
Registered Member
Posts
99
Karma
0
Hello Gael,

I would like to reopen this topic, because I still do not understand the performance of the two product operators.

Here I use another compiler, that is
Code: Select all
gcc version 4.6.2 (SUSE Linux)

on OpenSuse
Code: Select all
Linux version 3.1.10-1.16-desktop (geeko@buildhost)


The result is now:
Code: Select all
2012-12-14 10:32:49 |    INFO | thread   0 | Dimensions: (symmetric): (439476*439476), nonzeros: 2160889, storage: 0.00111882%.
2012-12-14 10:32:49 |    INFO | thread   0 | Dimensions: (full     ): (439476*439476), nonzeros: 3882302, storage: 0.00201011%.
2012-12-14 10:32:49 |    INFO | thread   0 | Time of symmetric product: 643ms.
2012-12-14 10:32:50 |    INFO | thread   0 | Time of standard  product: 621ms.


Hence, both times are similar. But theoretically, I expect a speedup of approx. 2x for the symmetrci product.

Compiler flags:
Code: Select all
g++ -c -pipe -Wall -Wextra -Werror -Wno-long-long -std=c++98 -fopenmp -O2 -O3 -Wall -W -D_REENTRANT -DNDEBUG -DQT_NO_DEBUG
naptrel
Registered Member
Posts
5
Karma
0
Seb wrote:Hence, both times are similar. But theoretically, I expect a speedup of approx. 2x for the symmetrci product.

Hi Seb,

Consider the following asymmetric vs symmetric matrix * vector products
Code: Select all
(a b) * (e) = (a*e + b*f)
(c d)   (f)   (c*e + d*f)

(a b) * (e) = (a*e + b*f)
(b d)   (f)   (b*e + d*f)

I don't think you can exploit the matrix symmetry to reduce the number of multiplications: both require 4 distinct multiplications, so I think similar results are the best you can expect?

Nathaniel
Seb
Registered Member
Posts
99
Karma
0
Hello Nathaniel,

for dense matrices I agree with you. For sparse matrices, however, situation is different and depends on the sparse matrix storage and used optimizations (both I do not know). Aside the number of products, the number of memory accesses to non-zero components must be compared. The symmetric storage of sparse matrices may reduce the latter significantly (one needs to iterate through approx. half as many non-zero elements).

Indeed, the authors of the article http://bebop.cs.berkeley.edu/pubs/csd-03-1297.pdf report a speed-up of even greater than 2x for symmetric matrices. In the current implementation in Eigen, the non-symmetric product even takes slightly less time (the time difference is negligible, though). Hence, either Eigen uses a super-optimized code for the non-symmetric product or the symmetric is not optimized (at least not for the compiler I use).

Best regards
Sebastian
naptrel
Registered Member
Posts
5
Karma
0
Hello Seb,

Thank you for responding so politely to my patronising post which assumed you were as much a novice as me. I've skim-read the referenced paper and am amazed that such performance benefits can be achieved--quite astonishing. Can the same performance gains be expected in a multi-cored/threaded environment?

Nathaniel.
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
what is currently limiting sparse*vector products is random memory access to the dense output (for column major) or dense input (for row major) vectors. So taking into account the symmetry does not help here, and is even worse because it increases the randomness of the accesses.
Seb
Registered Member
Posts
99
Karma
0
Thanks for your reply Gael,

is there any situation where Eigen reaches the theoretical performance being announced in the aforementioned article? Perhaps using row-major or UpperTriangular vs. LowerTriangular layouts?


Regarding the original topic I have a new test result on GCC 4.3.4 where the symmetric product takes 2.5x longer (again using -DNDEBUG). Could you possibly check this? I use the latest version of Eigen.

Best regards
Sebastian


Bookmarks



Who is online

Registered users: Bing [Bot], Google [Bot], Yahoo [Bot]