This forum has been archived. All content is frozen. Please use KDE Discuss instead.

Preventing dynamic allocation?

Tags: None
(comma "," separated)
drewm1980
Registered Member
Posts
13
Karma
0

Preventing dynamic allocation?

Thu May 13, 2010 6:43 pm
My Eigen implementation of an iterative numerical optimization routine is slower that my OpenCV implementation by a factor of about 2. I suspect dynamic allocation of temporary variables may be the culprit.

In OpenCV we create all of the storage for the temporary variables outside of the time critical sections of our code.

In Eigen, expressions are getting nested and evaluated lazily, so it is not at all clear when and why temporary variables are getting allocated.

Using valgrind to measure the number of allocations and manually varying the number of times each loop is taken, I have found that:

6 allocations are happening in each iteration of my outer loop
3 allocations are happening in each iteration of my inner loop

Here are some ideas that I have for dealing with this:

* Manually create temporary variables for every non-trivial subexpression, and call .eval() on every assignment. For better or worse, this should remove all performance effects of the delayed evaluation.

* Manually trigger the delayed evaluation at every assignment using .eval(), comment out all of the lines with assignments, and incrementally uncomment them while monitoring the number of allocations. When one is found that is triggering dynamic memory allocation, break it into subexpressions, and explicitly allocate the new temporary variables in a non-critical section. With this technique, at least some of the expressions will be able to eliminate temporary variables, if not all. This would require compiling and running the code several times for each nontrivial line of code.

* bjacob (Benoit) suggested that I could make improvements in Memory.h to at least make monitoring of the allocations a little more convenient.

Please share if you have any other ideas. I can afford to spend a couple more work days hacking around to try to match the performance of our OpenCV code; I'll post again if I figure out something clever.

Cheers,
Drew

P.S. I'm using a dev version of opencv, and I believe it is calling the stock debian BLAS.
Hauke
Registered Member
Posts
109
Karma
3
OS

Re: Preventing dynamic allocation?

Fri May 14, 2010 8:44 am
Just out of curiosity. Could you post your code?

- Hauke
drewm1980
Registered Member
Posts
13
Karma
0

Re: Preventing dynamic allocation?

Fri May 14, 2010 4:50 pm
I can't share the whole code since it's for a publication we're preparing, but I can share the lines that I suspect could be causing allocation of temporary variables. Talking to benoit on irc, it sounds like delayed evaluation only happens within a single line of code/assignment, so analyzing single lines in isolation might be sufficient.

* Capitalized variables are matrices, A is very tall, G is square
* Lower case are column vectors
* s* are scalars
* sizes are dynamic, some are mapped.

lhs = b - A*x;

lhs = A.transpose()*(b - c); // This one is really poorly behaved. In eigen release version, it causes a segfault unless I add a .eval(), and in dev version it crashes if I add a .noalias() on the lhs.

lhs = (c - s*(b + G*a)));

lhs += s*(y - A*x - e);

Thanks!
Drew
User avatar
bjacob
Registered Member
Posts
658
Karma
3

Re: Preventing dynamic allocation?

Sat May 15, 2010 12:30 am
drewm1980 wrote:I can't share the whole code since it's for a publication we're preparing, but I can share the lines that I suspect could be causing allocation of temporary variables. Talking to benoit on irc, it sounds like delayed evaluation only happens within a single line of code/assignment, so analyzing single lines in isolation might be sufficient.


Of course! Matrices are just that, matrices. No hidden place to store expression for later evaluation. so when you write matrix = expression; after the semicolon you're sure that the expression has evaluated into the matrix. Even if you wrote lvalue_expr = rvalue_xpr, that would still be the case.


Join us on Eigen's IRC channel: #eigen on irc.freenode.net
Have a serious interest in Eigen? Then join the mailing list!
User avatar
bjacob
Registered Member
Posts
658
Karma
3

Re: Preventing dynamic allocation?

Sat May 15, 2010 12:32 am
drewm1980 wrote:lhs = A.transpose()*(b - c); // This one is really poorly behaved. In eigen release version, it causes a segfault unless I add a .eval(), and in dev version it crashes if I add a .noalias() on the lhs.
Drew


This is hard to believe: we have tons of test covering that many times.

Can you:
1) make sure that NDEBUG is not defined (in MSVC, use Debug mode)
2) compile with full debug options (e.g. gcc -g3)
3) post backtrace here


Join us on Eigen's IRC channel: #eigen on irc.freenode.net
Have a serious interest in Eigen? Then join the mailing list!
Hauke
Registered Member
Posts
109
Karma
3
OS

Re: Preventing dynamic allocation?

Sat May 15, 2010 8:46 am
Hi again,

I did a small test and can't confirm much unexpected . There are exactly 400 temporaries (100 iterations * 4 temporaries, one per multiplication) allocated in the loop given in the following code and there is no transpose based crash. In release mode the timer is blow 2 msecs and this is measured after compiling with MSVC 2010 based on the development branch.

Code: Select all
static int nb_temporaries = 0;
#define EIGEN_DEBUG_MATRIX_CTOR { if(size!=0) nb_temporaries++; }

#include <Eigen/Core>
#include <Bench/BenchTimer.h>

#include <iostream>

using namespace Eigen;

static const int size = 500;

EIGEN_DONT_INLINE VectorXd run_test()
{
  VectorXd a = VectorXd::Random(size,1);
  VectorXd b = VectorXd::Random(size,1);
  VectorXd c = VectorXd::Random(size,1);
  VectorXd e = VectorXd::Random(size,1);
  VectorXd x = VectorXd::Random(size,1);
  VectorXd y = VectorXd::Random(size,1);

  MatrixXd A = MatrixXd::Random(size,size);
  MatrixXd G = MatrixXd::Random(size,size);

  const double s = ei_random(1.0, 100.0);

  VectorXd lhs(size,1);
  VectorXd noopt(size,1);

  nb_temporaries = 0;

  BenchTimer timer;
  timer.start();
  for (int i=0; i<100; ++i)
  {
    lhs.noalias() = b - A*x; // 1 temporary
    //lhs = b; lhs.noalias() -= A*x; // 0 temporaries
    noopt += lhs;

    lhs.noalias() = A.transpose() * (b-c); // 1 temporary
    noopt += lhs;

    lhs.noalias() = c - s*(b + G*a); // 1 temporary
    noopt += lhs;

    lhs += s*(y - A*x - e); // 1 temporary
    noopt += lhs;

    timer.stop();
  }
  std::cout << "time: " << timer.best()*1000.0 << " ms" << std::endl;
  std::cout << "ctor: " << nb_temporaries << std::endl;

  return lhs;
}

int main(void)
{
  std::cout << run_test().segment(1,3) << std::endl;
}


For future reference, we can only offer good help, when we get simple self-contained little example. Given all the nifty functions provided in Eigen this is not really hard. The code above shows such an example and it will not cost much time to implement something like that. And of course, beforehand I was not asking for your whole application. :)

In case you need more help, please let us know and we will answer when we find the time.

Regards,
Hauke
drewm1980
Registered Member
Posts
13
Karma
0

Re: Preventing dynamic allocation?

Sat May 15, 2010 8:24 pm
Hauke wrote:Hi again,

I did a small test and can't confirm much unexpected . There are exactly 400 temporaries (100 iterations * 4 temporaries, one per multiplication) allocated in the loop given in the following code and there is no transpose based crash.

...

For future reference, we can only offer good help, when we get simple self-contained little example.

Hauke


Thanks bjacob and Hauke! I'll spend some more time trying to replicate the crasher in a self-contained example. If the above example doesn't crash, perhaps the crash has something to do with the fact that some of the matrices are wrapped around data in numpy arrays, or because the code is getting compiled into a python extension by boost-python.

Cheers,
Drew
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS

Re: Preventing dynamic allocation?

Tue Jun 01, 2010 9:16 pm
I read some of your eigen objects are Map<> objects. There was a bug for Map<> and matrix products preventing some optimizations. I've just solved this issue, so it might be worth update your local copy and try again...
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS

Re: Preventing dynamic allocation?

Tue Jun 01, 2010 9:27 pm
some tips to reduce temporaries and improve performances:

lhs = b - A*x;
-> (lhs = b).noalias() -= A*x;

lhs = A.transpose()*(b - c);
-> lhs.noalias() = A.transpose()*(b - c);

lhs = (c - s*(b + G*a)));
-> lhs = c - s*b;
lhs.noalias() -= s*G*a; // this scalar product is free

lhs += s*(y - A*x - e);
-> lhs += s*(y-e);
lhs.noalias() -= s*A*x;


Bookmarks



Who is online

Registered users: Bing [Bot], Google [Bot], Sogou [Bot]