Reply to topic

Simple vector operations and OpenMP

pauljurczak
Registered Member
Posts
3
Karma
0

Simple vector operations and OpenMP

Fri Nov 09, 2012 12:21 pm
Hi,

I'm thinking about using Eigen just for its convenient vector notation, but I would like to parallelize the code to take advantage of multi-core CPU. Is there a way to do it without sacrificing concise notation like this (code below is approximate - I haven't used Eigen yet):
Code: Select all
Array<float, 1000000, 1> a, b;
.........
b = a.abs();

or do I have to brake it up:
Code: Select all
#pragma omp parallel for
for(int i=0; i<a.size(); i+=N)
    b.segment(i,N) = a.segment(i,N).abs();

Thanks in advance for any comments,
Paul
User avatar ggael
Moderator
Posts
2204
Karma
15
OS
It's generally not a good idea to parallelize at such a low level. From my experience you start to get some gains for vectors larger than about 300000 elements... so we decided not to bother about parallelizing coefficient-wise ops. A considerably better approach is to parallelize the whole algorithm.
pauljurczak
Registered Member
Posts
3
Karma
0
This was just a simple example - to make it more compelling, substitute abs() with arbitrarily compute intensive function. But even this simple case scales up very nicely on multi-core system. On my 4-core CPU it takes 1.88ms to execute non-OpenMP case and 0.55ms to execute OpenMP case with 4 threads, resulting in 3.41 speedup.

How difficult it would be to modify Eigen, so it generates "omp parallel for" code in cases like this or cases with more complex right side expression?
User avatar ggael
Moderator
Posts
2204
Karma
15
OS
The simplest is probably to implement a free function like:
Code: Select all
template<typename Dest, typename Src>
void assignMT(DenseBase<Dest>& dest, const DenseBase<Src>& src)
{
  assert(dest.rows()==src.rows() && dest.cols()==src.cols());
  if(dest.size()>10000)
  {
    int nb_threads = omp_num_threads();
    if(Dest::IsVectorAtCompileTime)
    {
      int chunk_size = dest.size()/nb_threads;
      #pragma omp ...
      for(int i=0; i<nb_threads; ++i)
      {
          int start = i*chunk_size;
          int actual_chunk_size = std::min(chunk_size, dest.size()-start);
          dest.segment(start,actual_chunk_size) = src.segment(start,actual_chunk_size);
      }
    }
    else if(Dest::Flags&RowMajorBit)
    {
       // handle the row-major matrix case
       ...
    }
    else
    {
      // handle the col-major matrix case
      ...
    }
  }
  else
   dest = src;
}


Please, get back to us if you go further...
pauljurczak
Registered Member
Posts
3
Karma
0
Thank you for your answer. I started using Eigen - mainly for short vector operations. I'm planning to do some comparative benchmarking against Intel Cilk Plus array notation and maybe homegrown short vector templates. I will post to the forum when I have some results.

 
Reply to topic

Bookmarks



Who is online

Registered users: ab4bd, Baidu [Spider], BeS, Bing [Bot], Exabot [Bot], GeekQuack, ggael, ghevan, Google [Bot], google01103, igorm, jensreuterberg, kainz.a, maarten, mmistretta, Nuc!eoN, orbmiser, plfiorini, scummos, searchfgold6789, SecretCode, Steve T, TheraHedwig, Uri_Herrera, vblazquez, Yahoo [Bot]