This forum has been archived. All content is frozen. Please use KDE Discuss instead.

(Eigen 3.2.8) Performance sparse linear solver

Tags: None
(comma "," separated)
bstaber
Registered Member
Posts
6
Karma
0
Hello Eigen community,

First of all, thanks for this C++ library, I really enjoy it. I wrote a C++ nonlinear finite element solver using Eigen for all the linear algebra routines. At each iteration of the Newton-Raphson algorithm, I have to solve a large sparse linear system Ax = b, where A is a 33756 x 33756 sparse matrix (yielding 1.139467536e+09 elements), with 1.19472e+06 non zero elements. Matrix A is not necessarily symmetric due to boundary conditions.
I tried a few solvers and got the best performance with UmfpackLU which takes approximatively 20 seconds while it takes only 5 seconds to MATLAB. I was told that this difference can be due to the fact that MATLAB's solvers "mldivide" (or "\") is multhreaded. Therefore a tried the -fopenmp flag with the following compilation command:

Code: Select all
g++ main.cpp readMesh.cpp readMesh.hpp -o myprogram -O3 -DNDEBUG -Wall -g -I/usr/local/include -I/usr/local/include/suitesparse -lumfpack -lamd -llapack -lblas -fopenmp


and run the program as follows

Code: Select all
export OMP_NUM_THREADS=n; ./myprogram [input files...]


I tried it on three different computers (respectively with 2, 32 and 4 as default OMP_NUM_THREADS). On the two first of them, it didn't change anything: there is no multithreading at all. On the third one, I observed that 4 threads are open when the linear systems are solved. However, I get a better performance (~9 s instead of ~20 s) by setting n=1 instead of n= 2,3, or 4!
How can we explain that ?
Why can I multithread my program on the third computer but not on the two other ones ? (I checked the number of threads with the top H command)
And finally, why do I get a better performance by setting n = 1 instead of letting the default value (n = 4) on the third machine ?

I can eventually export the matrix A and vector b (https://eigen.tuxfamily.org/dox-devel/group__TopicSparseSystems.html) if necessary.

Thanks for your help,
B.
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
If you're using Umfpack for solving, then enabling openmp on the compiler side won't change anything. You have to enable multithreading when configuring/compiling suitesparse itself.
bstaber
Registered Member
Posts
6
Karma
0
Thanks for your quick answer. Unfortunately I don't how to do that (><). I'll ask google!

Thanks again,
B.
bstaber
Registered Member
Posts
6
Karma
0
Hello,

I managed to enable multithreading for suitesparse however I still have one little issue. I noticed that only four threads are opened and I didn't find how to specify the number of threads I'd like umfpack to use. I tried

Code: Select all
export OMP_NUM_THREADS=n


or

Code: Select all
export OPENBLAS_NUM_THREADS=n


but it doens't work at all. When I use the following commands

Code: Select all
int n;
n = Eigen::nbThreads();
std::cout << "Number of threads: " << n << "\n";


it returns the right "n" I exported but only four threads are opened :(

Any ideas?

Thanks a lot.

Brian.
tienhung
Registered Member
Posts
29
Karma
0
The max number of threads depends on your CPU. What is your CPU model?
bstaber
Registered Member
Posts
6
Karma
0
Hello,

this is what I get when typing lscpu:

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 12
On-line CPU(s) list: 0-11
Thread(s) per core: 1
Core(s) per socket: 6
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 44
Stepping: 2
CPU MHz: 3458.089
BogoMIPS: 6915.95
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 12288K
NUMA node0 CPU(s): 0,2,4,6,8,10
NUMA node1 CPU(s): 1,3,5,7,9,11

Seems like I could open at least 12 threads unless i'm mistaken
steveshi
Registered Member
Posts
8
Karma
0
Did you modify the source code of suitesparse or your non-linear iteration code with OpenMP directive ?
if not, it can't be impoved.
Most of time, A Linear solver implements the algorithm with multi-thread already.
matlab is much effective due to the MKL library
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
if umfpack does not use openmp, then playing with OPENBLAS_NUM_THREADS is pointless. Again refer to umfpack documentation.


Bookmarks



Who is online

Registered users: Bing [Bot], Evergrowing, Google [Bot], rockscient