Registered Member
|
Hi!
My project (a scientific simulation) makes a lot use of Eigen due to it's nice API, it's quite major sparse-module and also it's speed. But regarding the speed I have not fully optimized everything I guess. Because this step need some know-how I'm asking you more experienced people for some tips. What Eigen functions I use:
My questions regarding optimization: General: How can I optimize my program without taking to much effort on it and without committing to one specific PC architecture to much. Because the simulation should run okay on all today's common "office" PCs (windows 7, intel i5,i7..., 8+ gb ram...windows 32 or 64 bit...).
- Which BLAS should I use? Does EIGEN use it's own BLAS even for the SuperLU-backed and is it fast or do I have to link an more optimized BLAS to get full SuperLU speed? Do I have to compile a specific optimized BLAS .lib for every slightly different PC-architecture I want to use the program on? - What about a faster solver for LDLT-decomposition. Is the speed gain worth the effort of getting for example Cholmod to work properly on windows? - Any common performance-pitfalls I could have been fallen for. E.g. slower operations on column major matrices than on row major like I read here somewhere... Greetings medic |
Moderator
|
What does your sparse matrices represent? How many non-zero per row/column?
Is Eigen's SimplicialLDLT faster than SuperLU for SPD matrices? If so how faster? Regarding your questions, Eigen can only exploit multithreading for dense matrix multiply. SuperLU uses the BLAS library you asked for, in other words it depends on how you compile SuperLU and how you link your application. It is recommended to use an optimized BLAS like Eigen, GotoBLAS or MKL. You can compile Eigen's blas by configuring a build directory with cmake, and then compiling the blas target/solution. |
Registered Member
|
The sparse matrices represent a electrical Network where one node is connected at most to 6 direct neighbours and at least has two connections. So within this range lies the count of none zeros per row/column. The matrices are either SPD or symmetric indefinit, but nearly SPD (I have to change one two columns if I apply a voltage instead of a current to the Network).
I compiled SuperLU with reference BLAS and linked my application to Eigens BLAS-implementation wich I already compiled using CMake. So if I get you right: SuperLU schould be compiled with your BLAS implementation too (or some other tuned BLAS) to get high performance and only linking the application with it won't work. Secondly multithreading Eigen will be no performance gain for the mostly sparse case. Currrently SimplicialLDLT is about 3 times faster than SuperLU (symmetric mode) on the SPD matrices. Over 50k x 50k the advance of SimplicialLDLT begins to narrow down. I will make some performance test before and after using the EigenBlas.lib with SuperLU. Are you interested in the results? If so, is there a standardized way to do this tests? Greetings, medic |
Moderator
|
Ok, your matrices seems to be too sparse to take advantage of supernodal approaches or multi-threading. For circuit simulation, there is a specialized solver in suitesparse:
http://www.cise.ufl.edu/research/sparse/klu/ but I don't know how painful it is to install and use (on windows, probably very painful!). If you can share one your typical matrix, we could easily bench all solvers supported by Eigen: #include <unsupported/Eigen/SparseExtra> writeMarket(mat, "filename.mtx"); and zip it (the Market format is an uncompressed ascii file format!!). |
Registered Member
|
Yes I tried to install KLU a few months ago and didn't manage to run it properly on windows. In fact, SuperLU is the only sparse-backend-solver that I managed to install and run with my Visual Studio compiled application. Even though I tried Suite-Sparse on Matlab an ran some tests. KLU and Cholmod (for the SPD-case) are really fast and outperformed UMFpack on Matlab and the other build-in solvers of MatLab, but SuperLU with Eigen in my C++ app wasn't much slower(or even a little faster). So I first accepted this easy solution and went with SuperLU and you build in LDLT-solver.
But now it turns out that I have to solve the Network a lot more often due to nonlinear Elements (-> Newton-Raphson with sparse Jacobian). So a faster decomposition of the Jacobian in every iterarion will be crutial for the overall performance. I will mail you my two typical matrices (current and voltage charging) in the next week or so, cause I currently change my code to implement the nonlinear Elements. I will link this thread then, so that you know what it was about. Maybe in the meantime I could try to install the new backend-solver PaStiX, or ask our project partner if they have an intel mkl licence. If this two would eventually be more suited for my case...?? |
Moderator
|
ok, if you already tried KLU and Cholmod through MatLab and got similar perf. than SuperLU, then I don't think you will find a faster solution. This is because your matrices are very sparse. PasTiX is good for tougher problems and will very likely be rather slow with your matrices. If you don't need a high precision, you might try the conjugate-gradient and BiCGSTAB solvers though I doubt you'll get any speedup.
|
Registered users: Baidu [Spider], Bing [Bot], Google [Bot], Yahoo [Bot]