This forum has been archived. All content is frozen. Please use KDE Discuss instead.

Optimization problems

Tags: None
(comma "," separated)
javifp
Registered Member
Posts
6
Karma
0

Optimization problems

Fri Feb 14, 2014 4:23 pm
Hello everyone!

I'm developing a fluid simulation code using Eigen for solving a linear system by means of the BiCGStab matrix solver. The basic flowchart is:

Time loop{
SparseMatrixBuilder();
BiCGStab();
PrintToFile();
}End time loop


It runs perfectly and pass the Valgrind test without any memory errors or leaks, but... It's very slow, even working with small matrices (for instances 300x300 elements). So i made a profilling with "gprof" and almost the 50% of the full simulation time is spent on this function:

"void Eigen::SparseMatrix<double, 0, int>::reserveInnerVectors<Eigen::SparseMatrix<double, 0, int>::SingletonVector>(Eigen::SparseMatrix<double, 0, int>::SingletonVector const&)"

It seems that the program's allocating memory for the sparse matrix in EVERY time step, but i declared the sparse matrix before the temporal loop, not inside.

Any idea?

Thanks in advance!
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS

Re: Optimization problems

Sat Feb 15, 2014 8:42 am
First make sure you're benchmarking with compiler optimizations ON and -DNDEBUG. Then I guess the issue is in SparseMatrixBuilder() where you probably insert elements one at a time without properly reserving memory. Is the structure of the matrix the same for each time-step? If so you could simply initialize the matrix structure once and update its value by looping over the non-zeros with an InnerIterator and use it.valueRef() = new_value. Otherwise, either properly reserve enough space for each column or use a basic vector of triplets.
javifp
Registered Member
Posts
6
Karma
0

Re: Optimization problems

Sat Feb 15, 2014 9:49 am
Hi Gael,

I'm usign all these flags:
-O3 -DNDEBUG -m64 -Ofast -flto -march=native -funroll-loops -ffast-math

so i think all the compiling optimizers are activated.

About the second issue, the structure of the matrix is the same for every time step. I think i'm initializating the matrix just once, before the matrix builder. Then i just "clear" all the elements with sMtxA.setZero(); before inserting the new ones with sMtxA.insert(ic,jc)=1.0; This is a simplified version of the function SparseMatrixBuilder:

Code: Select all
int DWSparseBuilder(type_Mesh *mesh,
           type_StateStructure *n, type_StateStructure *m,
           Eigen::SparseMatrix<double>& sMtxA, Eigen::VectorXd& b) {

// Variable initialization
   for(ic=0 ; ic<Nc ; ic++) vector_C_aux[ic]=0.0;
   term_A1=0.0; term_Asum=0.0; term_C1=0.0; term_Csum=0.0; term_Cbound=0.0;

   sMtxA.setZero();               // Clear sparse matrix
   b = VectorXd::Zero(Nc);            // Clear vector

for(ic=0;ic<Nc;ic++){   // DOMAIN sweep

// Clear variables
   term_Asum=0.0; term_Csum=0.0; term_Cbound=0.0;
   minngb=1E6;

   for(iw=0;iw<Nw;iw++) {
      minngb=min(mesh->cell[ic].face[iw].ngb,minngb); }

// DBC or SIB check: if the cell is DBC or SIB, go to the end of DOMAIN sweep (no changes)
   MaxVal=intMaxval(mesh->Nw,mesh->cell[ic].btc);
   if((minngb<=-1)&&(MaxVal>=2))
      goto skip;
   else{

   for(iw=0;iw<Nw;iw++){   // FACE Sweep

// Boundary Check: check boundary type and assign conditions
// BOUNDARIES
      if(mesh->cell[ic].face[iw].ngb==-1){
         if(mesh->cell[ic].btc[iw]==1){            // Closed wall
         term_A1=0.0; term_C1=0.0; term_Cbound=0.0; } }
// INNER CELLS
      else{
         term_Cbound=0.0;

         alphaCOMPUTATION(ic,iw,m,mesh,&alphaM);      // Computation of alpha averaged coefficients on the face w at iteration m

// Build summands of coefficients a_i (matrix diagonal)
         term_A1=alphaM*mesh->cell[ic].face[iw].A/mesh->cell[ic].face[iw].d;

// Build summands of coefficients c_i (free vector)
         term_C1=alphaM*mesh->cell[ic].face[iw].A/mesh->cell[ic].face[iw].d*mesh->cell[ic].face[iw].dz;

// Build coefficients b_w
         jc=mesh->cell[ic].face[iw].ngb;
         for(i=0;i<Nw;i++) {
            minngb=min(mesh->cell[jc].face[i].ngb,minngb); }
         MaxVal=intMaxval(Nw,mesh->cell[jc].btc);
         if((minngb<=-1)&&(MaxVal>=2)){
            vector_C_aux[ic]=vector_C_aux[ic]-alphaM*mesh->cell[ic].face[iw].A/mesh->cell[ic].face[iw].d*m->cell[jc].h;
            sMtxA.insert(ic,jc)=0.0;
         }
         else{
            sMtxA.insert(ic,jc)=(theta*dt/mesh->cell[ic].V*alphaM*mesh->cell[ic].face[iw].A/mesh->cell[ic].face[iw].d);
         }
      }   // END Boundary Check
      term_Asum=term_Asum+term_A1;               // Summ of coefficient a_i summands
      term_Csum=term_Csum+term_C1+term_Cbound;         // Summ of coefficient c_i summands and boundary condition extra term
      
   }   // END FACE sweep
// Computation of a_i and assignment in matrix diagonal
   sMtxA.insert(ic,ic)=1.0-theta*dt/mesh->cell[ic].V*term_Asum;
// Computation of c_i 
   b(ic)=n->cell[ic].h+theta*dt/mesh->cell[ic].V*(term_Csum+vector_C_aux[ic]);
   }   // END DBC or SIB check
skip: ;

} // END DOMAIN sweep

return(1);
}


In the main file i have:

typedef Eigen::SparseMatrix<double> spMatrix;

spMatrix sMtxA(Nc,Nc); // System sparse matrix
VectorXd b(Nc); // RHS vector
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS

Re: Optimization problems

Sat Feb 15, 2014 1:41 pm
Remove the setZero(), and replace the calls to insert() by coeffRef(). For the first run you should call reserve() with a VectorXi containing the number of non-zeros for each column, that is the valence of each vertex + 1
javifp
Registered Member
Posts
6
Karma
0

Re: Optimization problems

Sat Feb 15, 2014 3:25 pm
Ok, i've removed setZero() and replaced all the calls to insert() by coeffRef(). Now the program is 4x faster (Thanks!) but it isn't enough.

I mean, i programmed the same code with the same BiCGStab solver in Fortran some time ago (now i'm moving to C) and it is 5 times faster than the new one with Eigen-BiCGStab solver, performing the same calculation, so i assume that i'm doing anything else wrong with the sparse stuff.

About your second idea, now i'm building the sparse matrix for the first time, then reserving memory by means of:
sMtxA.reserve(Eigen::VectorXi::Constant(sMtxA.outerSize(),5)); and then comes the temporal loop (without any memory allocation). Unfortunately, this does not improve the simulation time.
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS

Re: Optimization problems

Mon Feb 17, 2014 9:13 am
Which step is taking most of the time? (among SparseMatrixBuilder(), BiCGStab(), and PrintToFile())
javifp
Registered Member
Posts
6
Karma
0

Re: Optimization problems

Mon Feb 17, 2014 10:34 am
I've just check that BiCGStab() is taken almost all of the time... This is the BiCGStab function:

Code: Select all
int sparseBiCGStab(int n,
          Eigen::SparseMatrix<double> sMtxA, Eigen::VectorXd b,
          Eigen::VectorXd *X, Eigen::VectorXd *Xx,
          Eigen::VectorXd Xo){

   int iterBCG;

   Eigen::BiCGSTAB<SparseMatrix<double> > BCGST;
   
//   BCGST.setMaxIterations(100000);
   BCGST.setTolerance(1.0E-5);
   
   BCGST.compute(sMtxA);
   *X = BCGST.solveWithGuess(b,Xo);   // Solve with initial guess Xo
   iterBCG=BCGST.iterations();
   
   if(showInfo==2){
      printf("%s",KYEL); std::cout << "** Solver error: " << BCGST.error() << std::endl; printf("%s",KNRM); }
   return(iterBCG);
}
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS

Re: Optimization problems

Mon Feb 17, 2014 12:57 pm
ok, and what do you mean by "same BiCGStab solver in Fortran" ? do you mean same BICGStab implementation or just with a BICGStab algorithm? BICGStab has many parameters, and the main ones are the precision the preconditioner. Compared to your previous version, does it take the same amount of iterations ? Do you get same level of accuracy? Do you use the same kind of preconditioner? What about multi-threading?
javifp
Registered Member
Posts
6
Karma
0

Re: Optimization problems

Tue Feb 18, 2014 10:32 am
With "same BiCGStab solver in Fortran" i mean just a BiCGStab algorithm. I took from Numerical Recipes book. Compared to this version, Eigen does very few iterations (just 6 or 7). I set the tolerance in both programs to 1E-05.

And now the most strange thing. For the sake of simplicity, i've written a simple code to build a sparse matrix and solve the system by means of both solvers. With small matrices (up to 1000 x 1000) there is no difference in calculation time. For large matrices (20000 x 20000), Eigen is by far the fastest.

I'll keep testing. Thanks for the support.
javifp
Registered Member
Posts
6
Karma
0

Re: Optimization problems

Tue Feb 25, 2014 10:20 am
Hi Gael,
I performed some changes in my code and now is quite faster (for instance, i added ILUT preconditioner). Now i'm wondering about your multithreading recomendation, but i read in some posts that it is only implemented for matrix multiplication. Do you think Eigen BiCGStab will speed up with multithreading?


Bookmarks



Who is online

Registered users: Baidu [Spider], Bing [Bot], Google [Bot]