This forum has been archived. All content is frozen. Please use KDE Discuss instead.

Poor performance solving sparse linear system of equations

Tags: None
(comma "," separated)
rmoraes
Registered Member
Posts
28
Karma
0
Hello everyone,

I am trying to solve a linear system A.x=b where A is a 2x2 block hepta-diagonal (sparse) matrix with the following dimensions:

Number of Columns = Number of Rows = 37106
Number of Non-Zeros: 491116

First I tried to solve the system using a direct solver (after all, it is not a huge system...), the SparseLU, as follows

Code: Select all
            Eigen::SparseMatrix<double> A(n,n);
            Eigen::SparseMatrix<double> rhs(n,1);
            A.reserve(Eigen::VectorXi::Constant(n, 15));
            // Matrix A and rhs are filled using .coeffRef(...) method. No performance issues here...
            A.makeCompressed();
            rhs.makeCompressed();
            Eigen::SparseLU<Eigen::SparseMatrix<double>> solver;
            solver.analyzePattern(A);
            solver.factorize(A);
            solver.solve(rhs);


The SparseLU solver is taking more than 10s to solve this system.

If I profile my code I see in the "Hot Path" that the method which is consuming most of the time is

Eigen::internal::sparselu_gemm<double,int>(int,int,int,double const *,int,double const *,int,double *,int)

Even if I try an iterative solver, e.g. ConjugateGradient or BiCGStab, the performance is not better. For instance, when I try the BiCGSTAB solver (with default preconditioner, but I also tried others) using the above code with the appropriate changes, it takes me more or less the same time as the SparseLU and the "Hot Path" shows me that the method which is consuming most of the time now is

Eigen::internal::sparse_time_dense_product_impl<class Eigen::SparseMatrix<double,0,int>,class Eigen::Matrix<double,-1,1,0,-1,1>,class Eigen::Matrix<double,-1,1,0,-1,1>,0,1>::run(class Eigen::SparseMatrix<double,0,int> const &,class Eigen::Matrix<double,-1,1,0,-1,1> const &,class Eigen::Matrix<double,-1,1,0,-1,1> &,double const &)

I am compiling my code using VS2010 and full optimization (/Ox).

The Eigen version I am using is the one provided by ggal in another post: https://bitbucket.org/eigen/eigen/get/3.2.zip.

Any help is very much appreciated!

Thanks!
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
10s for such a small problem is indeed a bit too much. For the rhs, you should rather use a dense VectorXd. There are only disadvantage in using a sparse matrix for it., though in your case the perf. should still be about the same.

To see what's going on, could you share your sparse matrix and the *dense* rhs using the instructions there: http://eigen.tuxfamily.org/dox-devel/gr ... tml#title7
rmoraes
Registered Member
Posts
28
Karma
0
Hi ggael,

Thanks for once again looking into one of my issues with Eigen.

I couldn't exactly understand what you meant by "share" since I could not find a way to upload a file here in the forum. So I sent you the file via e-mail.

And indeed, as you pointed out, changing to a dense vector did not have a noticeable effect on the performance of the code.
tienhung
Registered Member
Posts
29
Karma
0
Based on my experience, the performance of sparse direct solvers strongly depends on the structure of the lhs matrix. Have you check whether it is symmetric or, even better, positive definite? Maybe your sparse matrices are Ill-conditioned?
In case the .mtx files are big, you can upload them somewhere (dropbox, googledriver, ...) and put the links here.
rmoraes
Registered Member
Posts
28
Karma
0
Indeed, the matrix structure plays a major role in the performance of direct solvers. However, also based on my experience, and regardless the structure of the system I am now solving, the tests I performed with Eigen solvers, either direct or iterative, indicate that the solvers are performing considerably slower than others I have had access in the past. Actually, my experience is with thirdparty softwares that are able to solve the very same problem in a much faster fashion. And I believe ggael is also with the same impression that Eigen should perform better.

Although the .mtx files are not huge, I couldn't find a way to upload them directly to this post - what would be the ideal solution (Please let me know if there is a way!). However I liked your idea of having the files shared via a virtual drive. Follows the link to GoogleDrive:

https://drive.google.com/file/d/0ByElDQ ... sp=sharing
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
SparseLU seems to be the best choice for your pb, here it takes about 2.7s on a laptop i7@2.6GHz using either clang or gcc. Given the structure of your pb, a multi-frontal LU as in UMFPACK will be faster than SparseLU (0.7s). You can use UMFPACK through Eigen::UmfPackLU<> solver, but you need to compile and link to suitesparse....
rmoraes
Registered Member
Posts
28
Karma
0
After the times encountered running the linear system I provided, I ran my code again, but now with some timers around some specific steps of the solution process. Then I was able to realize that the factorization step is the one consuming most of the time: 10s!

I am running my code on a laptop i7@2.7 GHz. I am now wondering if the poor performance I am facing is somehow related to the OS/dev environment. I am on Windows 7, VS2010. My compilation options are:

/I"../../thirdparty" /Zi /nologo /W3 /WX- /Ox /Oi /Ot /Oy- /GL /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /Gm- /EHa /GS /Gy /fp:precise /Zc:wchar_t /Zc:forScope /Fp"Release\Executable.pch" /Fa"Release\" /Fo"Release\" /Fd"Release\vc100.pdb" /Gd /analyze- /errorReport:queue

The times reported for the other solvers are very encouraging, however I'd like to stick with the option of only using a light version of Eigen, at least at this point in time. So it would be ideal if I was able to reproduce the same SparseLU times posted above.
rmoraes
Registered Member
Posts
28
Karma
0
After reading more carefully the documentation, I tried to scale the matrix (in the case my system matrix was badly scaled), like described in the following link:

http://eigen.tuxfamily.org/dox-devel/cl ... rseLU.html

But no improvement whatsoever...
tienhung
Registered Member
Posts
29
Karma
0
Could you share me your lhs and rhs matrices? I can do a bench. I think 10s is too slow...
rmoraes
Registered Member
Posts
28
Karma
0
Hi,

a google drive link with the matrices has been posted a couple of posts above.

Thanks!
rmoraes
Registered Member
Posts
28
Karma
0
After removing some dependencies to x86 libraries I was able to change my Target Machine to x64. Now the SparseLU factorization takes 5s, but still twice slower than the times reported by ggael.

I already changed my compilation options as suggested in the Eigen webpage (http://eigen.tuxfamily.org/index.php?title=Developer%27s_Corner#Using_Visual_Studio).

I wonder if there is still something to be improved regarding my compilation options...
rmoraes
Registered Member
Posts
28
Karma
0
After trying some compilation flags alternatives, I am now wondering if the slow performance is, instead, associated with my hardware configuration.

Follows the laptop hardware I am running my code:

Processor: Inter(R) Core(TM) i7-2960XM CPU @ 2.7GHz 2.7GHz.
RAM: 32 GB.
L3 Cache Size: 8 MB.

Could you guys please also share your hardware configuration?

Last edited by rmoraes on Mon Oct 26, 2015 3:09 pm, edited 1 time in total.
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
Intel(R) Core(TM) i7-4960HQ CPU @ 2.60GHz
ram: 16GB
L3 cache: 6MB

MSVC is known to be pretty bad at generating optimized code compared to clang or gcc.
tienhung
Registered Member
Posts
29
Karma
0
Here is the result I got with my first comp: CPU Xeon E5-2609 v2@2.50GHz, 16Gb RAM:
SparseLU: 5.5s
SuperLU: 5.3s
UmfPackLU: 1.0s

I also tested with my old station (Xeon W3520, 12 Gb RAM) and UmfPack is still the best choice with around 1.1s. The time performance should be much better with i7 CPUs and GCC compiler.
rmoraes
Registered Member
Posts
28
Karma
0
Now I tried to solve the system in a newer hardware (I7-5600U 2.6 GHz), compiling with Intel (Intel Parallel Studio Composer 2015 XE Update 5), but it still takes 4s to solve the system using SparseLU solver. This is more or less in accordance with what tienhung reported previously.

I wonder if ggael has some other settings for running this test system I should be aware of...

I am now considering to give UmfPack a try, but I cannot find any instructions on how to compile sparsesuite on Windows...

Could someone please provide me with the instructions?

Thanks!


Bookmarks



Who is online

Registered users: bartoloni, Bing [Bot], Evergrowing, Google [Bot], ourcraft