Optimization problems • KDE Community Forums

This forum has been archived. All content is frozen. Please use KDE Discuss instead.

Board index

Optimization problems

Page 1 of 1 (10 posts)

Tags:

javifp Registered Member Posts 6 Karma 0	Optimization problems Fri Feb 14, 2014 4:23 pm Hello everyone! I'm developing a fluid simulation code using Eigen for solving a linear system by means of the BiCGStab matrix solver. The basic flowchart is: Time loop{ SparseMatrixBuilder(); BiCGStab(); PrintToFile(); }End time loop It runs perfectly and pass the Valgrind test without any memory errors or leaks, but... It's very slow, even working with small matrices (for instances 300x300 elements). So i made a profilling with "gprof" and almost the 50% of the full simulation time is spent on this function: "void Eigen::SparseMatrix<double, 0, int>::reserveInnerVectors<Eigen::SparseMatrix<double, 0, int>::SingletonVector>(Eigen::SparseMatrix<double, 0, int>::SingletonVector const&)" It seems that the program's allocating memory for the sparse matrix in EVERY time step, but i declared the sparse matrix before the temporal loop, not inside. Any idea? Thanks in advance!
ggael Moderator Posts 3447 Karma 19 OS	Re: Optimization problems Sat Feb 15, 2014 8:42 am First make sure you're benchmarking with compiler optimizations ON and -DNDEBUG. Then I guess the issue is in SparseMatrixBuilder() where you probably insert elements one at a time without properly reserving memory. Is the structure of the matrix the same for each time-step? If so you could simply initialize the matrix structure once and update its value by looping over the non-zeros with an InnerIterator and use it.valueRef() = new_value. Otherwise, either properly reserve enough space for each column or use a basic vector of triplets.
javifp Registered Member Posts 6 Karma 0	Re: Optimization problems Sat Feb 15, 2014 9:49 am Hi Gael, I'm usign all these flags: -O3 -DNDEBUG -m64 -Ofast -flto -march=native -funroll-loops -ffast-math so i think all the compiling optimizers are activated. About the second issue, the structure of the matrix is the same for every time step. I think i'm initializating the matrix just once, before the matrix builder. Then i just "clear" all the elements with sMtxA.setZero(); before inserting the new ones with sMtxA.insert(ic,jc)=1.0; This is a simplified version of the function SparseMatrixBuilder: Code: Select all int DWSparseBuilder(type_Mesh mesh, type_StateStructure n, type_StateStructure m, Eigen::SparseMatrix<double>& sMtxA, Eigen::VectorXd& b) { // Variable initialization for(ic=0 ; ic<Nc ; ic++) vector_C_aux[ic]=0.0; term_A1=0.0; term_Asum=0.0; term_C1=0.0; term_Csum=0.0; term_Cbound=0.0; sMtxA.setZero(); // Clear sparse matrix b = VectorXd::Zero(Nc); // Clear vector for(ic=0;ic<Nc;ic++){ // DOMAIN sweep // Clear variables term_Asum=0.0; term_Csum=0.0; term_Cbound=0.0; minngb=1E6; for(iw=0;iw<Nw;iw++) { minngb=min(mesh->cell[ic].face[iw].ngb,minngb); } // DBC or SIB check: if the cell is DBC or SIB, go to the end of DOMAIN sweep (no changes) MaxVal=intMaxval(mesh->Nw,mesh->cell[ic].btc); if((minngb<=-1)&&(MaxVal>=2)) goto skip; else{ for(iw=0;iw<Nw;iw++){ // FACE Sweep // Boundary Check: check boundary type and assign conditions // BOUNDARIES if(mesh->cell[ic].face[iw].ngb==-1){ if(mesh->cell[ic].btc[iw]==1){ // Closed wall term_A1=0.0; term_C1=0.0; term_Cbound=0.0; } } // INNER CELLS else{ term_Cbound=0.0; alphaCOMPUTATION(ic,iw,m,mesh,&alphaM); // Computation of alpha averaged coefficients on the face w at iteration m // Build summands of coefficients a_i (matrix diagonal) term_A1=alphaMmesh->cell[ic].face[iw].A/mesh->cell[ic].face[iw].d; // Build summands of coefficients c_i (free vector) term_C1=alphaMmesh->cell[ic].face[iw].A/mesh->cell[ic].face[iw].dmesh->cell[ic].face[iw].dz; // Build coefficients b_w jc=mesh->cell[ic].face[iw].ngb; for(i=0;i<Nw;i++) { minngb=min(mesh->cell[jc].face[i].ngb,minngb); } MaxVal=intMaxval(Nw,mesh->cell[jc].btc); if((minngb<=-1)&&(MaxVal>=2)){ vector_C_aux[ic]=vector_C_aux[ic]-alphaMmesh->cell[ic].face[iw].A/mesh->cell[ic].face[iw].dm->cell[jc].h; sMtxA.insert(ic,jc)=0.0; } else{ sMtxA.insert(ic,jc)=(thetadt/mesh->cell[ic].ValphaMmesh->cell[ic].face[iw].A/mesh->cell[ic].face[iw].d); } } // END Boundary Check term_Asum=term_Asum+term_A1; // Summ of coefficient a_i summands term_Csum=term_Csum+term_C1+term_Cbound; // Summ of coefficient c_i summands and boundary condition extra term } // END FACE sweep // Computation of a_i and assignment in matrix diagonal sMtxA.insert(ic,ic)=1.0-thetadt/mesh->cell[ic].Vterm_Asum; // Computation of c_i b(ic)=n->cell[ic].h+thetadt/mesh->cell[ic].V*(term_Csum+vector_C_aux[ic]); } // END DBC or SIB check skip: ; } // END DOMAIN sweep return(1); } In the main file i have: typedef Eigen::SparseMatrix<double> spMatrix; spMatrix sMtxA(Nc,Nc); // System sparse matrix VectorXd b(Nc); // RHS vector
ggael Moderator Posts 3447 Karma 19 OS	Re: Optimization problems Sat Feb 15, 2014 1:41 pm Remove the setZero(), and replace the calls to insert() by coeffRef(). For the first run you should call reserve() with a VectorXi containing the number of non-zeros for each column, that is the valence of each vertex + 1
javifp Registered Member Posts 6 Karma 0	Re: Optimization problems Sat Feb 15, 2014 3:25 pm Ok, i've removed setZero() and replaced all the calls to insert() by coeffRef(). Now the program is 4x faster (Thanks!) but it isn't enough. I mean, i programmed the same code with the same BiCGStab solver in Fortran some time ago (now i'm moving to C) and it is 5 times faster than the new one with Eigen-BiCGStab solver, performing the same calculation, so i assume that i'm doing anything else wrong with the sparse stuff. About your second idea, now i'm building the sparse matrix for the first time, then reserving memory by means of: sMtxA.reserve(Eigen::VectorXi::Constant(sMtxA.outerSize(),5)); and then comes the temporal loop (without any memory allocation). Unfortunately, this does not improve the simulation time.
ggael Moderator Posts 3447 Karma 19 OS	Re: Optimization problems Mon Feb 17, 2014 9:13 am Which step is taking most of the time? (among SparseMatrixBuilder(), BiCGStab(), and PrintToFile())
javifp Registered Member Posts 6 Karma 0	Re: Optimization problems Mon Feb 17, 2014 10:34 am I've just check that BiCGStab() is taken almost all of the time... This is the BiCGStab function: Code: Select all int sparseBiCGStab(int n, Eigen::SparseMatrix<double> sMtxA, Eigen::VectorXd b, Eigen::VectorXd X, Eigen::VectorXd Xx, Eigen::VectorXd Xo){ int iterBCG; Eigen::BiCGSTAB<SparseMatrix<double> > BCGST; // BCGST.setMaxIterations(100000); BCGST.setTolerance(1.0E-5); BCGST.compute(sMtxA); X = BCGST.solveWithGuess(b,Xo); // Solve with initial guess Xo iterBCG=BCGST.iterations(); if(showInfo==2){ printf("%s",KYEL); std::cout << "* Solver error: " << BCGST.error() << std::endl; printf("%s",KNRM); } return(iterBCG); }
ggael Moderator Posts 3447 Karma 19 OS	Re: Optimization problems Mon Feb 17, 2014 12:57 pm ok, and what do you mean by "same BiCGStab solver in Fortran" ? do you mean same BICGStab implementation or just with a BICGStab algorithm? BICGStab has many parameters, and the main ones are the precision the preconditioner. Compared to your previous version, does it take the same amount of iterations ? Do you get same level of accuracy? Do you use the same kind of preconditioner? What about multi-threading?
javifp Registered Member Posts 6 Karma 0	Re: Optimization problems Tue Feb 18, 2014 10:32 am With "same BiCGStab solver in Fortran" i mean just a BiCGStab algorithm. I took from Numerical Recipes book. Compared to this version, Eigen does very few iterations (just 6 or 7). I set the tolerance in both programs to 1E-05. And now the most strange thing. For the sake of simplicity, i've written a simple code to build a sparse matrix and solve the system by means of both solvers. With small matrices (up to 1000 x 1000) there is no difference in calculation time. For large matrices (20000 x 20000), Eigen is by far the fastest. I'll keep testing. Thanks for the support.
javifp Registered Member Posts 6 Karma 0	Re: Optimization problems Tue Feb 25, 2014 10:20 am Hi Gael, I performed some changes in my code and now is quite faster (for instance, i added ILUT preconditioner). Now i'm wondering about your multithreading recomendation, but i read in some posts that it is only implemented for matrix multiplication. Do you think Eigen BiCGStab will speed up with multithreading?

Page 1 of 1 (10 posts)

Bookmarks

Who is online

Registered users: Baidu [Spider], Bing [Bot], Google [Bot]