This forum has been archived. All content is frozen. Please use KDE Discuss instead.

Preconditioned CG_MATLAB vs CG_Eigen_MEX

Tags: None
(comma "," separated)
chenjiang
Registered Member
Posts
2
Karma
0
Hi, recently I need to use my slow mixed Matlab/C++ code to run my simulation about a 5 million elements fluid-structure interactions.
The linear equation solver was MATLAB bulit-in pcg solver which I found it is only using one thread.
Hence, I try to solve the pressure poisson equation using CPP-MEX to wrap the pcg of Eigen_dev branch.

My test codes in Matlab is
Code: Select all
 
fprintf('MSG: Test of Eigen_pcg begin!!\n');
f_nnode = size(f_node_coord,1);
load('rhs_foreigentest.mat');
b = rhs;
x0 = zeros(f_nnode,1);
   
[f_H,b] = sol_CBS_impose_pbc(f_H,b,f_pbc_dof,f_pbc_val);
tic
[x1] = pcg(f_H, b,1e-5,1000 );
toc
   
b = rhs;
tic
[x] = c_sol_pcg_eigen(f_H_ir, f_H_jc, f_H_val, b, x0, 1000, 1e-5,int32(f_pbc_dof), f_pbc_val);
toc
   
b = rhs;
tic
[x2] = c_sol_pcg_eigen_OMP...
(f_H_ir, f_H_jc, f_H_val, b, x0, 1000, 1e-5,int32(f_pbc_dof), f_pbc_val);
toc


Sparse matrix f_H is [29292x29292], nnz of H is 389062.

Mex build command in Windows7 without OPENMP:
Code: Select all
mex -I".\eigen_dev" c_sol_pcg_eigen_OMP.cpp

Mex build command in Windows7 with OPENMP:
Code: Select all
mex -I".\eigen_dev" c_sol_pcg_eigen_OMP.cpp COMPLFAGS="/openmp $COMPFLAGS"

Mex build command in Linux (opensuse 13.12) with OPENMP:
Code: Select all
mex c_sol_pcg_eigen_OMP.cpp -I".\eigen_dev"  CXXPLFAGS="\$CXXFLAGS -fopenmp" LDFLAGS="\$LDFLAGS -fopenmp"


My CPP Mex warpper for PCG_Eigen is like below,
Code: Select all
/*==========================================================

This subroutine is written in C for invoking in Matlab 2009a.
Before using it, compile this C code using proper C compiler.

[x] = c_sol_pcg_eigen(A_ir, A_jc, A_val, b, x0, maxiter, tol, pbc_dof, pbc_val)
Input:
(1) : A_ir, row-id vector for CSR Sparse matrix format.
(2) : A_jc, col-id vector for CSR Sparse matrix format.
(3) : A_val, value vector for CSR Sparse matrix format.
(4) : b, right-hands term.
(5) : x0, initial guess of unknowns variables.
(6) : maxiter, max number of iterations.
(7) : tol, tolerance of residuals to check if converged.
(8) : pbc_dof,
(9) : pbc_val.
Output:
(1) : x


Author: Chen Jiang  2016-04-20-15.42 jiangchen2007@gmail.coom
*========================================================*/
#include "mex.h"
#include "math.h"
#include "matrix.h"
#include <Eigen/Sparse>
#include <Eigen/Dense>
#include <vector>
#include <algorithm>
#include <iostream>

using namespace Eigen;
using namespace std;

typedef Eigen::Triplet<double> T;


// the gateway function
void mexFunction( int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
    //input vars
    int *A_ir;
    int *A_jc;
    double *A_val;
    double *b;
    double *x0;
    int maxiter;
    double tol;
    int *pbc_dof;
    double *pbc_val;

    //output vars
    double *x;


    //temp vars
    int nrows;
    int nnz;
    std::vector<T> tripletList;
    int npbc_dof;

    //-----------------
    // GetData
    //-----------------
    if(nrhs != 9){
        mexErrMsgIdAndTxt("MEX:c_sol_pcg_eigen:rhs",
                "This function takes too much input arguments.");
    }

    A_ir = (int*)mxGetData(prhs[0]);
    A_jc = (int*)mxGetData(prhs[1]);
    A_val = mxGetPr(prhs[2]);
    b = mxGetPr(prhs[3]);
    x0 = mxGetPr(prhs[4]);
    maxiter = mxGetScalar(prhs[5]);
    tol = mxGetScalar(prhs[6]);
    pbc_dof = (int*) mxGetData(prhs[7]);
    pbc_val = mxGetPr(prhs[8]);

    nrows = mxGetM(prhs[3]);
    npbc_dof = mxGetM(prhs[7]);
   nnz = mxGetM(prhs[0]);

    plhs[0] = mxCreateDoubleMatrix(nrows,1,mxREAL);
   x = mxGetPr(plhs[0]);

    //-----------------
    //calculations
    //-----------------

    //covert A_ir, A_jc, A_val to Eigen Sparse matrix
    tripletList.reserve(nnz);
    for(int i=0; i<nnz; i++){
        tripletList.push_back(T(A_ir[i]-1, A_jc[i]-1, A_val[i]));
    }

    SparseMatrix<double> A_eigen(nrows, nrows);
    A_eigen.setFromTriplets(tripletList.begin(), tripletList.end());

    Map<VectorXd> b_eigen(b, nrows);

    //apply pbc
    for(int idof = 0; idof < npbc_dof; idof++){
        int dof = pbc_dof[idof]-1;
        A_eigen.coeffRef(dof, dof) = 1.0e8;
        b_eigen(dof) = 1.0e8*pbc_val[idof];
    }
    //CG solver of Eigen
    VectorXd x_eigen;
    Map<VectorXd> x0_eigen(x0, nrows);
    ConjugateGradient<SparseMatrix<double>, Lower|Upper, IncompleteCholesky<double> > cg;
   //ConjugateGradient<SparseMatrix<double>, Lower|Upper > cg;
    //cg.preconditioner().setDroptol(0.001);
    cg.setTolerance(tol);
    cg.setMaxIterations(maxiter);
    cg.compute(A_eigen);
    x_eigen = cg.solveWithGuess(b_eigen, x0_eigen);
   //x_eigen = cg.solve(b_eigen);

    //
   for(int i=0; i<nrows; i++){
      x[i] = x_eigen(i);
   }
}


At Windows7, the ellapsed times for different solver are:
Matlab built-in pcg function: 2.39255 sec
Serial Cpp-mex pcg of Eigen: 0.56422 sec
OpenMP Cpp-mex pcg of Eigen: 0.518612 sec (2 threads only)

The full version of my post is on my blog:
https://www.zybuluo.com/BravoWA/note/356279

Wish this post is useful. And Can anyone provide better strategy to convert Eigen VectorXd to MATLAB MxArray in MEX code more efficient?
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
Thanks for sharing your code. You can replace "x_eigen = ..." by "VectorXd::Map(x,nrows) = ..." to avoid the copy.
pauloribeiro
Registered Member
Posts
1
Karma
0
Hi there,

I'm describing my experience and also have some questions on this thread.

The following is my experience in building MEX files for Eigen's conjugate gradient method. The performance for the single thread version is very interesting, even beating Matlab results with the following option (no preconditioning):

Code: Select all
 ConjugateGradient<SparseMatrix<double>, Lower|Upper > cg;


Problem: small benefit is being given by the "openmp" option, which results in inferior or similar performance. Resource monitor displays activities in other processors. The setup and problem description are given bellow:

1. Setup

Intel i7 9750h (hexacore, hyperthreading enabled), 16Gb RAM, Windows 10

2. Problem description

Finite element method
Sparse matrix, dimensions = 1e6 x 1e6
Nonzeros = 13e6
Symmetric

3. C++ code for openmp

Based on chenjiang's thread, with:

#include <omp.h>
Eigen::initParallel();
Eigen::setNbThreads(nt); %% where different values for nt where applied for every build

and complete code given by:

Code: Select all
#include "mex.h"
#include "math.h"
#include "matrix.h"
#include <Eigen/Sparse>
#include <Eigen/Dense>
#include <vector>
#include <algorithm>
#include <iostream>
#include <omp.h>
using namespace Eigen;
using namespace std;
typedef Eigen::Triplet<double> T;
// the gateway function
void mexFunction( int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
    Eigen::initParallel();
    Eigen::setNbThreads(2);
    //input vars
    int *A_ir;     
    int *A_jc;     
    double *A_val; 
    double *b;     
    double *x0;     
    int maxiter;   
    double tol;     
    //output vars
    double *x;
    //temp vars
    int nrows;
    int nnz;
    std::vector<T> tripletList;
    //-----------------
    // GetData
    //-----------------
    if(nrhs != 7){
        mexErrMsgIdAndTxt("MEX:c_sol_pcg_eigen:rhs",
                "This function takes too much input arguments.");
    }
    A_ir = (int*)mxGetData(prhs[0]);
    A_jc = (int*)mxGetData(prhs[1]);
    A_val = mxGetPr(prhs[2]);
    b = mxGetPr(prhs[3]);
    x0 = mxGetPr(prhs[4]);
    maxiter = mxGetScalar(prhs[5]);
    tol = mxGetScalar(prhs[6]);
    nrows = mxGetM(prhs[3]);
    nnz = mxGetM(prhs[0]);
    plhs[0] = mxCreateDoubleMatrix(nrows,1,mxREAL);
    x = mxGetPr(plhs[0]);
    //-----------------
    //calculations
    //-----------------
    //covert A_ir, A_jc, A_val to Eigen Sparse matrix
    tripletList.reserve(nnz);
    for(int i=0; i<nnz; i++){
        tripletList.push_back(T(A_ir[i]-1, A_jc[i]-1, A_val[i]));
    }
    SparseMatrix<double> A_eigen(nrows, nrows);
    A_eigen.setFromTriplets(tripletList.begin(), tripletList.end());
    Map<VectorXd> b_eigen(b, nrows);
       
    //CG solver of Eigen
    VectorXd x_eigen;
    Map<VectorXd> x0_eigen(x0, nrows);
    ConjugateGradient<SparseMatrix<double>, Lower|Upper > cg;
    cg.setTolerance(tol);
    cg.setMaxIterations(maxiter);
    cg.compute(A_eigen);
    VectorXd::Map(x,nrows) = cg.solveWithGuess(b_eigen, x0_eigen);
   
}


4. MEX compile options

Here two different compilers are used:

a) Microsoft Visual C++ 2019, with the following commands:

mex -v -IC:\...\pathtoeigen CXXFLAGS="/openmp $CXXFLAGS" pcg_eigen.cpp [BUILD OK]

with pcg solution time:

NbThreads(2) >>> elapsed time = 219.20s
NbThreads(4) >>> elapsed time = 218.16s

* reference time for single thread = 208.96s

mex -v -IC:\...\pathtoeigen COMPFLAGS="$COMPFLAGS /openmp" pcg_eigen.cpp [BUILD OK]

with pcg solution time:

NbThreads(2) >>> elapsed time = 199.48s
NbThreads(4) >>> elapsed time = 198.27s
NbThreads(6) >>> elapsed time = 203.56s

* reference time for single thread = 208.96s

b) MinGW64, with the following commands:

mex -v -IC:\...\pathtoeigen CXXFLAGS="-fopenmp $CXXFLAGS" pcg_eigen.cpp [RESULTS IN ERROR]
or
mex -v -IC:\...\pathtoeigen CXXFLAGS="-fopenmp $CXXFLAGS" -LDFLAGS='$LDFLAGS -fopenmp' pcg_eigen.cpp [RESULTS IN ERROR]
undefined reference to `GOMP_loop_ull_dynamic_start'
undefined reference to `GOMP_loop_ull_dynamic_next'
....

another option:
mex -v -IC:\...\pathtoeigen COMPFLAGS="$COMPFLAGS -fopenmp" pcg_eigen.cpp [BUILD OK]

with pcg solution time:

NbThreads(2) >>> elapsed time = 213.81s
NbThreads(4) >>> elapsed time = 206.93s
NbThreads(6) >>> elapsed time = 206.70s
NbThreads(10) >>> elapsed time = 203.74s

* reference time for single thread = 208.96s

5. Final remarks

I wonder if the benefit of openmp is valid in Windows 10 builds. I've also tried different versions of Eigen (3.3.5, 3.3.4, 3.3.0), and the performance of openmp is still negligible. Perhaps Linux + Matlab + GCC will provide more efficient results?

Please share your ideas and comments.
Thanks!

Paulo


Bookmarks



Who is online

Registered users: bartoloni, Bing [Bot], Google [Bot], Yahoo [Bot]