Eigen in an OpenMP parallel region

Board index

Page 1 of 1 (6 posts)

Tags:

torutakahashi Registered Member Posts 4 Karma 0 OS	Eigen in an OpenMP parallel region Thu Nov 16, 2017 9:43 am Hi, I am trying to enable the parallel functionality of the Eigen (version 3.3.4) in a nested OpenMP parallel region. Here is the pseudocode together with the numerical result: ============================== // Consider the LU decomposition of a n-by-n matrix, say M, where n=5000 (then, M requires 288MB of memory). MatrixXd M = MatrixXd::Random(n, n); // First, I performed the LU decomposition of the matrix M with a single thread. omp_set_num_threads(1); LU_with_Eigen(M); // Note that the matrix M is copied to this function, and thus not destroyed. The result was 2.37 second on my PC with 8 physical cores. // Next, I used 4 threads. omp_set_num_threads(4); LU_with_Eigen(M); The result was 1.21 second. The scalability is not so good, but we can see that the Eigen actually parallelised the decomposition. // Then, I tested if the Eigen was parallelised or not in an OpenMP parallel region. omp_set_nested(true): // Enable the nesting of OpenMP. omp_set_num_threads(2); // Use two threads (say TH0 and TH1) in the next parallel region. #pragma omp parallel { // Let each thread use 4 threads; the total number of threads is 8 and less than the number of CPU cores. omp_set_num_threads(4); LU_with_Eigen(M); // Unfortunately the result was 2.91 sec for TH0 and 2.92 secfor TH1, although I had expected to obtain the same timing result as the previous one, that is, 1.21 sec. } ============================== In this test, I used the compiler of g++ 6.3.1 (on Fedora 24). The amount of RAM is 8GB, which is enough to store the underlying matrix M as well as its copy. I am assuming that the Eigen can be parallelised in an OpenMP parallel reigion. Is this correct? If correct, am I doing something wrong? Best, Toru Takahashi
torutakahashi Registered Member Posts 4 Karma 0 OS	Re: Eigen in an OpenMP parallel region Thu Nov 16, 2017 9:53 am Hi, I forgot to mention two things: (1) The compile options are "-O3 -mtune=native -march=native -fopenmp". (2) The function LU_by_Eigen() is implemented as void LU_with_Eigen(MatrixXd M) { timerType *t; createTimer(&t); startTimer(t); PartialPivLU<MatrixXd> LU = M.lu(); stopTimer(t); printTimer(t); freeTimer(t); } where "timer" is my utility for timing. Toru
torutakahashi Registered Member Posts 4 Karma 0 OS	Re: Eigen in an OpenMP parallel region Thu Nov 16, 2017 3:26 pm Hi, When I replaced "LU_with_Eigen()" with "MatMatProduct_with_Eigen()", which computes the square of a given matrix, the behaviour of Eigen did not change --- Eigen was not parallelised in the OpenMP parallel region. For this regard, I have found the reason in the source code "Eigen/src/Core/products/Parallelizer.h", that is, // if multi-threading is explicitely disabled, not useful, or if we already are in a parallel session, // then abort multi-threading // FIXME omp_get_num_threads()>1 only works for openmp, what if the user does not use openmp? if((!Condition) \|\| (threads==1) \|\| (omp_get_num_threads()>1)) return func(0,rows, 0,cols); Clearly, the value of omp_get_num_threads() is greater than 1 in an OpenMP parallel region. In fact, when I assigned one thread to the OpenMP parallel region of my code, the Eigen was parallelised with the number of threads specified by omp_set_num_threads() inside the parallel region. I am wondering why Eigen assumes "omp_get_num_threads()>1" to parallelize the mat-mat product and, perhaps, LU decomposition. Best, Toru Takahashi
ggael Moderator Posts 3447 Karma 19 OS	Re: Eigen in an OpenMP parallel region Thu Nov 16, 2017 8:23 pm That's on purpose, because if the caller code is already multithreaded then we assume that all other cores are fully used, and thus multithreading the matrix-matrix product would be counter productive (it can even be orders of magnitude slower). Of course if you have 8 cores, and a parallel section with 4 threads with only one doing matrix-matrix products (or LU), you might still want Eigen to use 5 threads. We need to find a simple mechanism for that.
torutakahashi Registered Member Posts 4 Karma 0 OS	Re: Eigen in an OpenMP parallel region Thu Nov 16, 2017 11:59 pm Hi Gael, Thank you for your reply. I understand that, on purpose, the current Eigen cannot parallelise matrix operations in a parallel region. I guess this is because the managing cache sizes is not obvious.... I hope this nested-parallelisation is included in the Eigen's todo list. Thank you, Toru
Tschew Registered Member Posts 1 Karma 0	Re: Eigen in an OpenMP parallel region Thu Jul 04, 2019 11:30 am Dear ggael, has this been addressed by any chance? We have a use case in which having nested threading would be extremely useful. Cheers, Bartek