Registered Member
|
Hi, recently I need to use my slow mixed Matlab/C++ code to run my simulation about a 5 million elements fluid-structure interactions.
The linear equation solver was MATLAB bulit-in pcg solver which I found it is only using one thread. Hence, I try to solve the pressure poisson equation using CPP-MEX to wrap the pcg of Eigen_dev branch. My test codes in Matlab is
Sparse matrix f_H is [29292x29292], nnz of H is 389062. Mex build command in Windows7 without OPENMP:
Mex build command in Windows7 with OPENMP:
Mex build command in Linux (opensuse 13.12) with OPENMP:
My CPP Mex warpper for PCG_Eigen is like below,
At Windows7, the ellapsed times for different solver are: Matlab built-in pcg function: 2.39255 sec Serial Cpp-mex pcg of Eigen: 0.56422 sec OpenMP Cpp-mex pcg of Eigen: 0.518612 sec (2 threads only) The full version of my post is on my blog: https://www.zybuluo.com/BravoWA/note/356279 Wish this post is useful. And Can anyone provide better strategy to convert Eigen VectorXd to MATLAB MxArray in MEX code more efficient? |
Moderator
|
Thanks for sharing your code. You can replace "x_eigen = ..." by "VectorXd::Map(x,nrows) = ..." to avoid the copy.
|
Registered Member
|
Hi there,
I'm describing my experience and also have some questions on this thread. The following is my experience in building MEX files for Eigen's conjugate gradient method. The performance for the single thread version is very interesting, even beating Matlab results with the following option (no preconditioning):
Problem: small benefit is being given by the "openmp" option, which results in inferior or similar performance. Resource monitor displays activities in other processors. The setup and problem description are given bellow: 1. Setup Intel i7 9750h (hexacore, hyperthreading enabled), 16Gb RAM, Windows 10 2. Problem description Finite element method Sparse matrix, dimensions = 1e6 x 1e6 Nonzeros = 13e6 Symmetric 3. C++ code for openmp Based on chenjiang's thread, with: #include <omp.h> Eigen::initParallel(); Eigen::setNbThreads(nt); %% where different values for nt where applied for every build and complete code given by:
4. MEX compile options Here two different compilers are used: a) Microsoft Visual C++ 2019, with the following commands: mex -v -IC:\...\pathtoeigen CXXFLAGS="/openmp $CXXFLAGS" pcg_eigen.cpp [BUILD OK] with pcg solution time: NbThreads(2) >>> elapsed time = 219.20s NbThreads(4) >>> elapsed time = 218.16s * reference time for single thread = 208.96s mex -v -IC:\...\pathtoeigen COMPFLAGS="$COMPFLAGS /openmp" pcg_eigen.cpp [BUILD OK] with pcg solution time: NbThreads(2) >>> elapsed time = 199.48s NbThreads(4) >>> elapsed time = 198.27s NbThreads(6) >>> elapsed time = 203.56s * reference time for single thread = 208.96s b) MinGW64, with the following commands: mex -v -IC:\...\pathtoeigen CXXFLAGS="-fopenmp $CXXFLAGS" pcg_eigen.cpp [RESULTS IN ERROR] or mex -v -IC:\...\pathtoeigen CXXFLAGS="-fopenmp $CXXFLAGS" -LDFLAGS='$LDFLAGS -fopenmp' pcg_eigen.cpp [RESULTS IN ERROR] undefined reference to `GOMP_loop_ull_dynamic_start' undefined reference to `GOMP_loop_ull_dynamic_next' .... another option: mex -v -IC:\...\pathtoeigen COMPFLAGS="$COMPFLAGS -fopenmp" pcg_eigen.cpp [BUILD OK] with pcg solution time: NbThreads(2) >>> elapsed time = 213.81s NbThreads(4) >>> elapsed time = 206.93s NbThreads(6) >>> elapsed time = 206.70s NbThreads(10) >>> elapsed time = 203.74s * reference time for single thread = 208.96s 5. Final remarks I wonder if the benefit of openmp is valid in Windows 10 builds. I've also tried different versions of Eigen (3.3.5, 3.3.4, 3.3.0), and the performance of openmp is still negligible. Perhaps Linux + Matlab + GCC will provide more efficient results? Please share your ideas and comments. Thanks! Paulo |
Registered users: bartoloni, Bing [Bot], Google [Bot], Yahoo [Bot]