This forum has been archived. All content is frozen. Please use KDE Discuss instead.

Eigen and threads

Tags: None
(comma "," separated)
noir.sender
Registered Member
Posts
6
Karma
0

Eigen and threads

Sat Apr 04, 2009 10:32 pm
Hi all,

I got some questions and thoughts regarding eigen and threads and ways to optimize program runtime.
Feel free to correct me in any point.
We know from the documentation from eigen that eigen operations are vectorized.
E.g, while using floats and SSE2 instructions, eigen can generate code that is essentially 4x times faster,
(in the ideal case if the size of the vector is multiple of 4). This allows for instructions that operate on a group of 4 floats at a time.

E.g.:

MatrixXf mat(1,16);

the default c-like loop is;

for (int i = 0; i < mat.cols(); ++i)
{
//do something here
mat(i) = //blah blah blah
}


The code generated by eigen is of the following structure:

for (int i = 0; i < mat.cols() / 4; i += 4)
{
//do something here
mat.packet(i) = //blah blah blah
//packet(i) refers to a group of 4 float
}


Now, using e.g. omp threads, which i find to be the easiest way to apply threads and is also intergrated in gcc, the first example cud be simply written:

#pragma omp parallel for
for (int i = 0; i < mat.cols(); ++i)
{
//do something here
mat(i) = //blah blah blah
}

The above is the simplest case where u do not change the state of the vector mat. Just doing simple calculations completelly independent from each other (quite common operation).
In case u got a couple of cores at your disposal and quite large matrices u might get really impressive gains of performance (of course do not expect an exact multiple)

Now, the most interesting part for the case of packets, i can imagine one could write:

#pragma omp parallel for
for (int i = 0; i < mat.cols() / 4; i += 4)
{
//do something here
mat.packet(i) = //blah blah blah
}

Hopefully the generated code is applied to groups of 4-floats for each processor - thread. In case u might have access to some 16-core beasts and huge data u might get great impovement.
Did not test the above things yet but seems quite feasible.

So, cud such a mechanism be intergrated in eigen??? Any operation cud be benefit from it. Dot-products, matrix multiplications and many other more complex expressions build up by simple matrix-vector operations. Just add some simple omp directives in the back-bone of eigen and we get automatic and transparent parallelization. Of couse for small matrices u might get and overhead that can be eliminated with compile-time checks.
A simple search in the Src/Core reveals many for loops that can be parallelized. In addition more complex omp-directives can be used that deliver as well performance gain.

That's it...What do ya say??

g.

noir
User avatar
bjacob
Registered Member
Posts
658
Karma
3

RE: Eigen and threads

Wed Apr 08, 2009 1:19 am
We investigated OpenMP-ifying Eigen's loops a while ago. But it's nontrivial to do that well. One big question to solve is how to control that Eigen's own parallelization (if we are to do that) plays well with the user application's own parallelization. OpenMP 2 didn't help with that, perhaps 3 is better. Then there are all the questions of how to reduce thread interdependency while keeping memory efficiency and balanced loads, etc... it's really nontrivial. Now if someone who knows OpenMP well thinks he's able to do that well, that's very welcome! Anyway, development issues are discussed on the mailing list, not the forum.


Join us on Eigen's IRC channel: #eigen on irc.freenode.net
Have a serious interest in Eigen? Then join the mailing list!
noir.sender
Registered Member
Posts
6
Karma
0

RE: Eigen and threads

Wed Apr 08, 2009 7:56 pm
ok i see the point.

thanks at least for the reply!!
had thought noone wud answer, in contrary with the quick response here in the list:-)
keep up the good work!

noir


Bookmarks



Who is online

Registered users: Bing [Bot], Evergrowing, Google [Bot], rblackwell