Registered Member
|
Hello everyone,
I am trying to solve a linear system A.x=b where A is a 2x2 block hepta-diagonal (sparse) matrix with the following dimensions: Number of Columns = Number of Rows = 37106 Number of Non-Zeros: 491116 First I tried to solve the system using a direct solver (after all, it is not a huge system...), the SparseLU, as follows
The SparseLU solver is taking more than 10s to solve this system. If I profile my code I see in the "Hot Path" that the method which is consuming most of the time is Eigen::internal::sparselu_gemm<double,int>(int,int,int,double const *,int,double const *,int,double *,int) Even if I try an iterative solver, e.g. ConjugateGradient or BiCGStab, the performance is not better. For instance, when I try the BiCGSTAB solver (with default preconditioner, but I also tried others) using the above code with the appropriate changes, it takes me more or less the same time as the SparseLU and the "Hot Path" shows me that the method which is consuming most of the time now is Eigen::internal::sparse_time_dense_product_impl<class Eigen::SparseMatrix<double,0,int>,class Eigen::Matrix<double,-1,1,0,-1,1>,class Eigen::Matrix<double,-1,1,0,-1,1>,0,1>::run(class Eigen::SparseMatrix<double,0,int> const &,class Eigen::Matrix<double,-1,1,0,-1,1> const &,class Eigen::Matrix<double,-1,1,0,-1,1> &,double const &) I am compiling my code using VS2010 and full optimization (/Ox). The Eigen version I am using is the one provided by ggal in another post: https://bitbucket.org/eigen/eigen/get/3.2.zip. Any help is very much appreciated! Thanks! |
Moderator
|
10s for such a small problem is indeed a bit too much. For the rhs, you should rather use a dense VectorXd. There are only disadvantage in using a sparse matrix for it., though in your case the perf. should still be about the same.
To see what's going on, could you share your sparse matrix and the *dense* rhs using the instructions there: http://eigen.tuxfamily.org/dox-devel/gr ... tml#title7 |
Registered Member
|
Hi ggael,
Thanks for once again looking into one of my issues with Eigen. I couldn't exactly understand what you meant by "share" since I could not find a way to upload a file here in the forum. So I sent you the file via e-mail. And indeed, as you pointed out, changing to a dense vector did not have a noticeable effect on the performance of the code. |
Registered Member
|
Based on my experience, the performance of sparse direct solvers strongly depends on the structure of the lhs matrix. Have you check whether it is symmetric or, even better, positive definite? Maybe your sparse matrices are Ill-conditioned?
In case the .mtx files are big, you can upload them somewhere (dropbox, googledriver, ...) and put the links here. |
Registered Member
|
Indeed, the matrix structure plays a major role in the performance of direct solvers. However, also based on my experience, and regardless the structure of the system I am now solving, the tests I performed with Eigen solvers, either direct or iterative, indicate that the solvers are performing considerably slower than others I have had access in the past. Actually, my experience is with thirdparty softwares that are able to solve the very same problem in a much faster fashion. And I believe ggael is also with the same impression that Eigen should perform better.
Although the .mtx files are not huge, I couldn't find a way to upload them directly to this post - what would be the ideal solution (Please let me know if there is a way!). However I liked your idea of having the files shared via a virtual drive. Follows the link to GoogleDrive: https://drive.google.com/file/d/0ByElDQ ... sp=sharing |
Moderator
|
SparseLU seems to be the best choice for your pb, here it takes about 2.7s on a laptop i7@2.6GHz using either clang or gcc. Given the structure of your pb, a multi-frontal LU as in UMFPACK will be faster than SparseLU (0.7s). You can use UMFPACK through Eigen::UmfPackLU<> solver, but you need to compile and link to suitesparse....
|
Registered Member
|
After the times encountered running the linear system I provided, I ran my code again, but now with some timers around some specific steps of the solution process. Then I was able to realize that the factorization step is the one consuming most of the time: 10s!
I am running my code on a laptop i7@2.7 GHz. I am now wondering if the poor performance I am facing is somehow related to the OS/dev environment. I am on Windows 7, VS2010. My compilation options are: /I"../../thirdparty" /Zi /nologo /W3 /WX- /Ox /Oi /Ot /Oy- /GL /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /Gm- /EHa /GS /Gy /fp:precise /Zc:wchar_t /Zc:forScope /Fp"Release\Executable.pch" /Fa"Release\" /Fo"Release\" /Fd"Release\vc100.pdb" /Gd /analyze- /errorReport:queue The times reported for the other solvers are very encouraging, however I'd like to stick with the option of only using a light version of Eigen, at least at this point in time. So it would be ideal if I was able to reproduce the same SparseLU times posted above. |
Registered Member
|
After reading more carefully the documentation, I tried to scale the matrix (in the case my system matrix was badly scaled), like described in the following link:
http://eigen.tuxfamily.org/dox-devel/cl ... rseLU.html But no improvement whatsoever... |
Registered Member
|
Could you share me your lhs and rhs matrices? I can do a bench. I think 10s is too slow...
|
Registered Member
|
Hi,
a google drive link with the matrices has been posted a couple of posts above. Thanks! |
Registered Member
|
After removing some dependencies to x86 libraries I was able to change my Target Machine to x64. Now the SparseLU factorization takes 5s, but still twice slower than the times reported by ggael.
I already changed my compilation options as suggested in the Eigen webpage (http://eigen.tuxfamily.org/index.php?title=Developer%27s_Corner#Using_Visual_Studio). I wonder if there is still something to be improved regarding my compilation options... |
Registered Member
|
After trying some compilation flags alternatives, I am now wondering if the slow performance is, instead, associated with my hardware configuration.
Follows the laptop hardware I am running my code: Processor: Inter(R) Core(TM) i7-2960XM CPU @ 2.7GHz 2.7GHz. RAM: 32 GB. L3 Cache Size: 8 MB. Could you guys please also share your hardware configuration?
Last edited by rmoraes on Mon Oct 26, 2015 3:09 pm, edited 1 time in total.
|
Moderator
|
Intel(R) Core(TM) i7-4960HQ CPU @ 2.60GHz
ram: 16GB L3 cache: 6MB MSVC is known to be pretty bad at generating optimized code compared to clang or gcc. |
Registered Member
|
Here is the result I got with my first comp: CPU Xeon E5-2609 v2@2.50GHz, 16Gb RAM:
SparseLU: 5.5s SuperLU: 5.3s UmfPackLU: 1.0s I also tested with my old station (Xeon W3520, 12 Gb RAM) and UmfPack is still the best choice with around 1.1s. The time performance should be much better with i7 CPUs and GCC compiler. |
Registered Member
|
Now I tried to solve the system in a newer hardware (I7-5600U 2.6 GHz), compiling with Intel (Intel Parallel Studio Composer 2015 XE Update 5), but it still takes 4s to solve the system using SparseLU solver. This is more or less in accordance with what tienhung reported previously.
I wonder if ggael has some other settings for running this test system I should be aware of... I am now considering to give UmfPack a try, but I cannot find any instructions on how to compile sparsesuite on Windows... Could someone please provide me with the instructions? Thanks! |
Registered users: bartoloni, Bing [Bot], Evergrowing, Google [Bot], ourcraft