![]() Registered Member ![]()
|
Hi,
I am using 3.1.1 and find out that the product of a sparse matrix with dense vectors is considerably slower when using a self-adjoint triangular form compared with a full sparse matrix. test code (pseudo code):
The matrix A is the connectivity matrix of a graph... output: Dimensions: (symmetric): (439476*439476), nonzeros: 2160889, storage: 0.00111882%. Dimensions: (full ): (439476*439476), nonzeros: 3882302, storage: 0.00201011%. Time of symmetric product: 13031ms. Time of standard product: 1965ms. Nearly a factor 7x slower.... Can anyone confirm this result or even has some ideas on how to circumvent this issue? I expect to be faster when using the lower triangular matrix (less iterations over non-zero elements) It be at least as fast as the 'full' version (if not faster). Best regards Sebastian |
![]() Moderator ![]()
|
hm, the last time I benched these operations I got similar performances. Are you compiling with optimization enabled?
also, what's your compiler? maybe it's just a function which is not properly inlined. |
![]() Registered Member ![]()
|
I use gcc 4.2.1 on Mac OS Lion:
- gcc version 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.11.00) The parameters of g++ are: -Wall -Wextra -Werror -Wno-long-long -std=c++98 -O2 -arch x86_64 -Xarch_x86_64 -mmacosx-version-min=10.5 -DQT_NO_DEBUG -DQT_SHARED Best regards Sebastian |
![]() Moderator ![]()
|
ok, I can indeed reproduce a x10 difference in speed, I'll investigate
|
![]() Moderator ![]()
|
oh, with a more recent version of gcc, there is no slowdown, so yes that's probably just an inlining issue
|
![]() Moderator ![]()
|
the workaround is to disable assertion with -DNDEBUG
|
![]() Registered Member ![]()
|
Ah. Thanks for investigating this. The solution with -D NDEBUG should perfectly work for me. I will check this tomorrow.
PS: Gcc 4.2.1 on Mac s really a pity! |
![]() Registered Member ![]()
|
Ok. It works perfectly now.
One comment, though. Using the compiler configuration noted above, I get a compile error in Eigen when using -DNDEBUG I changed the line SparseSelfAdjointView.h:210
to
to compile. Thank you very much! Sebastian |
![]() Registered Member ![]()
|
Hello Gael,
I would like to reopen this topic, because I still do not understand the performance of the two product operators. Here I use another compiler, that is
on OpenSuse
The result is now:
Hence, both times are similar. But theoretically, I expect a speedup of approx. 2x for the symmetrci product. Compiler flags:
|
![]() Registered Member ![]()
|
Hi Seb, Consider the following asymmetric vs symmetric matrix * vector products
I don't think you can exploit the matrix symmetry to reduce the number of multiplications: both require 4 distinct multiplications, so I think similar results are the best you can expect? Nathaniel |
![]() Registered Member ![]()
|
Hello Nathaniel,
for dense matrices I agree with you. For sparse matrices, however, situation is different and depends on the sparse matrix storage and used optimizations (both I do not know). Aside the number of products, the number of memory accesses to non-zero components must be compared. The symmetric storage of sparse matrices may reduce the latter significantly (one needs to iterate through approx. half as many non-zero elements). Indeed, the authors of the article http://bebop.cs.berkeley.edu/pubs/csd-03-1297.pdf report a speed-up of even greater than 2x for symmetric matrices. In the current implementation in Eigen, the non-symmetric product even takes slightly less time (the time difference is negligible, though). Hence, either Eigen uses a super-optimized code for the non-symmetric product or the symmetric is not optimized (at least not for the compiler I use). Best regards Sebastian |
![]() Registered Member ![]()
|
Hello Seb,
Thank you for responding so politely to my patronising post which assumed you were as much a novice as me. I've skim-read the referenced paper and am amazed that such performance benefits can be achieved--quite astonishing. Can the same performance gains be expected in a multi-cored/threaded environment? Nathaniel. |
![]() Moderator ![]()
|
what is currently limiting sparse*vector products is random memory access to the dense output (for column major) or dense input (for row major) vectors. So taking into account the symmetry does not help here, and is even worse because it increases the randomness of the accesses.
|
![]() Registered Member ![]()
|
Thanks for your reply Gael,
is there any situation where Eigen reaches the theoretical performance being announced in the aforementioned article? Perhaps using row-major or UpperTriangular vs. LowerTriangular layouts? Regarding the original topic I have a new test result on GCC 4.3.4 where the symmetric product takes 2.5x longer (again using -DNDEBUG). Could you possibly check this? I use the latest version of Eigen. Best regards Sebastian |
Registered users: Bing [Bot], Google [Bot], Yahoo [Bot]