Registered Member
|
Hello everyone!
I'm developing a fluid simulation code using Eigen for solving a linear system by means of the BiCGStab matrix solver. The basic flowchart is: Time loop{ SparseMatrixBuilder(); BiCGStab(); PrintToFile(); }End time loop It runs perfectly and pass the Valgrind test without any memory errors or leaks, but... It's very slow, even working with small matrices (for instances 300x300 elements). So i made a profilling with "gprof" and almost the 50% of the full simulation time is spent on this function: "void Eigen::SparseMatrix<double, 0, int>::reserveInnerVectors<Eigen::SparseMatrix<double, 0, int>::SingletonVector>(Eigen::SparseMatrix<double, 0, int>::SingletonVector const&)" It seems that the program's allocating memory for the sparse matrix in EVERY time step, but i declared the sparse matrix before the temporal loop, not inside. Any idea? Thanks in advance! |
Moderator
|
First make sure you're benchmarking with compiler optimizations ON and -DNDEBUG. Then I guess the issue is in SparseMatrixBuilder() where you probably insert elements one at a time without properly reserving memory. Is the structure of the matrix the same for each time-step? If so you could simply initialize the matrix structure once and update its value by looping over the non-zeros with an InnerIterator and use it.valueRef() = new_value. Otherwise, either properly reserve enough space for each column or use a basic vector of triplets.
|
Registered Member
|
Hi Gael,
I'm usign all these flags: -O3 -DNDEBUG -m64 -Ofast -flto -march=native -funroll-loops -ffast-math so i think all the compiling optimizers are activated. About the second issue, the structure of the matrix is the same for every time step. I think i'm initializating the matrix just once, before the matrix builder. Then i just "clear" all the elements with sMtxA.setZero(); before inserting the new ones with sMtxA.insert(ic,jc)=1.0; This is a simplified version of the function SparseMatrixBuilder:
In the main file i have: typedef Eigen::SparseMatrix<double> spMatrix; spMatrix sMtxA(Nc,Nc); // System sparse matrix VectorXd b(Nc); // RHS vector |
Moderator
|
Remove the setZero(), and replace the calls to insert() by coeffRef(). For the first run you should call reserve() with a VectorXi containing the number of non-zeros for each column, that is the valence of each vertex + 1
|
Registered Member
|
Ok, i've removed setZero() and replaced all the calls to insert() by coeffRef(). Now the program is 4x faster (Thanks!) but it isn't enough.
I mean, i programmed the same code with the same BiCGStab solver in Fortran some time ago (now i'm moving to C) and it is 5 times faster than the new one with Eigen-BiCGStab solver, performing the same calculation, so i assume that i'm doing anything else wrong with the sparse stuff. About your second idea, now i'm building the sparse matrix for the first time, then reserving memory by means of: sMtxA.reserve(Eigen::VectorXi::Constant(sMtxA.outerSize(),5)); and then comes the temporal loop (without any memory allocation). Unfortunately, this does not improve the simulation time. |
Moderator
|
Which step is taking most of the time? (among SparseMatrixBuilder(), BiCGStab(), and PrintToFile())
|
Registered Member
|
I've just check that BiCGStab() is taken almost all of the time... This is the BiCGStab function:
|
Moderator
|
ok, and what do you mean by "same BiCGStab solver in Fortran" ? do you mean same BICGStab implementation or just with a BICGStab algorithm? BICGStab has many parameters, and the main ones are the precision the preconditioner. Compared to your previous version, does it take the same amount of iterations ? Do you get same level of accuracy? Do you use the same kind of preconditioner? What about multi-threading?
|
Registered Member
|
With "same BiCGStab solver in Fortran" i mean just a BiCGStab algorithm. I took from Numerical Recipes book. Compared to this version, Eigen does very few iterations (just 6 or 7). I set the tolerance in both programs to 1E-05.
And now the most strange thing. For the sake of simplicity, i've written a simple code to build a sparse matrix and solve the system by means of both solvers. With small matrices (up to 1000 x 1000) there is no difference in calculation time. For large matrices (20000 x 20000), Eigen is by far the fastest. I'll keep testing. Thanks for the support. |
Registered Member
|
Hi Gael,
I performed some changes in my code and now is quite faster (for instance, i added ILUT preconditioner). Now i'm wondering about your multithreading recomendation, but i read in some posts that it is only implemented for matrix multiplication. Do you think Eigen BiCGStab will speed up with multithreading? |
Registered users: Baidu [Spider], Bing [Bot], Google [Bot]