This forum has been archived. All content is frozen. Please use KDE Discuss instead.

Getting speedup in a multithreaded application

Tags: None
(comma "," separated)
pedromoreira
Registered Member
Posts
1
Karma
0
Greetings,

We developed an application that uses Eigen and we are now attempting to parallelize it.
There are cases where we have multiple threads using Eigen to perform operations that are independent from each other, therefore we expect a speedup close to linear with the number of cores. And yet, we are unable to achieve such speedup.

Here is a MWE:

Code: Select all

#include <iostream>
#include <omp.h>
#include <Eigen/Dense>
#include <Eigen/Core>

using namespace Eigen;
using namespace std;

#define SIZE 30
#define ITERATIONS 100000

int main()
{
  omp_set_num_threads(24);
  Eigen::setNbThreads(1);
  Eigen::initParallel();

  MatrixXd a = MatrixXd::Random(SIZE,SIZE);

  auto total = 0;

#pragma omp parallel for reduction(+ : total)
  for(unsigned int i = 0; i < ITERATIONS; ++i) {
        MatrixXd b = MatrixXd::Random(SIZE,SIZE);
        MatrixXd r = b * a * b;
        total += r.sum();
  }

  cout << total << endl;
}


We ran this program on a machine with the following specs:
- Two Intel Xeon E5-2630 v2 @ 2.60GHz processors (each with 6 cores and 12 threads)
- 256GB of memory

..and got the following result:
Serial: 60s
Parallel (24 threads): 28s

While the performed operations are independent, the speedup we get from running it on 24 threads is only of 2. On other programs that do not use Eigen, we get speedups of 12, as expected.

The documentation ( http://eigen.tuxfamily.org/dox/TopicMultiThreading.html ) doesn't appear to have more details on the matter.

We would appreciate if someone could shed some light on this matter and possibly suggest a solution.

Best regards,
-Pedro Moreira
Tal
Registered Member
Posts
30
Karma
0
I'm only an Eigen begginer user, but I think that Eigen random function uses the standard C random function.
C random function is tend to use locking mechanism (I think it's because of determinism).

Therefor, in my opinion, you didn't get optimal performance because of random which locks.

A possible solution is to use custom random generator that does not lock.


twithaar
Registered Member
Posts
23
Karma
0
Also note that you have many memory allocations going on.
Re-using allocations for b and r with the openMp firstprivate clause could help a lot.
tienhung
Registered Member
Posts
29
Karma
0
My system: Xeon E5-2609 v2 @2.50 GHz (4 cores, 4 threads), 16 Gb RAM

Code: Select all
#include <iostream>
#include "Eigen/Dense"
#include "Eigen/Core"
#include "bench/BenchTimer.h"

#define SIZE 30
#define ITERATIONS 100000

int main()
{
   Eigen::MatrixXd a = Eigen::MatrixXd::Random(SIZE, SIZE);
   
   Eigen::BenchTimer timer;
   timer.start();
   auto total = 0;

#pragma omp parallel for reduction(+ : total)
   for (int i = 0; i < ITERATIONS; ++i)
   {
      Eigen::MatrixXd b = Eigen::MatrixXd::Random(SIZE, SIZE);
      Eigen::MatrixXd r = b * a * b;
      total += r.sum();
   }

   timer.stop();   
   std::cout << "total = " << total << "; parallel time = " << timer.value() << std::endl;

   timer.start();
   total = 0;
   for (int i = 0; i < ITERATIONS; ++i)
   {
      Eigen::MatrixXd b = Eigen::MatrixXd::Random(SIZE, SIZE);
      Eigen::MatrixXd r = b * a * b;
      total += r.sum();
   }

   timer.stop();   
   std::cout << "total = " << total << "; serial time = " << timer.value() << std::endl;
   
   getchar();
   return 0;
}


And here is what I get:
total = 97421; parallel time = 1.02181 (the number of threads could be only 3 here, I didn't check it)
total = 98817; serial time = 3.45789

I don't think you need to enable internal multi-threading in Eigen when using OpenMP. I am wondering how could it take 30s on your system.
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
Your machine as only 12 physical cores, so tell OpenMP to use only 12 threads. The reason is that matrix-matrix products are highly optimized and occupy 100% of the ALU, therefore using hyperthreading is counter productive.


Bookmarks



Who is online

Registered users: Bing [Bot], Google [Bot], q.ignora, watchstar