Registered Member
|
My Eigen implementation of an iterative numerical optimization routine is slower that my OpenCV implementation by a factor of about 2. I suspect dynamic allocation of temporary variables may be the culprit.
In OpenCV we create all of the storage for the temporary variables outside of the time critical sections of our code. In Eigen, expressions are getting nested and evaluated lazily, so it is not at all clear when and why temporary variables are getting allocated. Using valgrind to measure the number of allocations and manually varying the number of times each loop is taken, I have found that: 6 allocations are happening in each iteration of my outer loop 3 allocations are happening in each iteration of my inner loop Here are some ideas that I have for dealing with this: * Manually create temporary variables for every non-trivial subexpression, and call .eval() on every assignment. For better or worse, this should remove all performance effects of the delayed evaluation. * Manually trigger the delayed evaluation at every assignment using .eval(), comment out all of the lines with assignments, and incrementally uncomment them while monitoring the number of allocations. When one is found that is triggering dynamic memory allocation, break it into subexpressions, and explicitly allocate the new temporary variables in a non-critical section. With this technique, at least some of the expressions will be able to eliminate temporary variables, if not all. This would require compiling and running the code several times for each nontrivial line of code. * bjacob (Benoit) suggested that I could make improvements in Memory.h to at least make monitoring of the allocations a little more convenient. Please share if you have any other ideas. I can afford to spend a couple more work days hacking around to try to match the performance of our OpenCV code; I'll post again if I figure out something clever. Cheers, Drew P.S. I'm using a dev version of opencv, and I believe it is calling the stock debian BLAS. |
Registered Member
|
Just out of curiosity. Could you post your code?
- Hauke |
Registered Member
|
I can't share the whole code since it's for a publication we're preparing, but I can share the lines that I suspect could be causing allocation of temporary variables. Talking to benoit on irc, it sounds like delayed evaluation only happens within a single line of code/assignment, so analyzing single lines in isolation might be sufficient.
* Capitalized variables are matrices, A is very tall, G is square * Lower case are column vectors * s* are scalars * sizes are dynamic, some are mapped. lhs = b - A*x; lhs = A.transpose()*(b - c); // This one is really poorly behaved. In eigen release version, it causes a segfault unless I add a .eval(), and in dev version it crashes if I add a .noalias() on the lhs. lhs = (c - s*(b + G*a))); lhs += s*(y - A*x - e); Thanks! Drew |
Registered Member
|
Of course! Matrices are just that, matrices. No hidden place to store expression for later evaluation. so when you write matrix = expression; after the semicolon you're sure that the expression has evaluated into the matrix. Even if you wrote lvalue_expr = rvalue_xpr, that would still be the case.
Join us on Eigen's IRC channel: #eigen on irc.freenode.net
Have a serious interest in Eigen? Then join the mailing list! |
Registered Member
|
This is hard to believe: we have tons of test covering that many times. Can you: 1) make sure that NDEBUG is not defined (in MSVC, use Debug mode) 2) compile with full debug options (e.g. gcc -g3) 3) post backtrace here
Join us on Eigen's IRC channel: #eigen on irc.freenode.net
Have a serious interest in Eigen? Then join the mailing list! |
Registered Member
|
Hi again,
I did a small test and can't confirm much unexpected . There are exactly 400 temporaries (100 iterations * 4 temporaries, one per multiplication) allocated in the loop given in the following code and there is no transpose based crash. In release mode the timer is blow 2 msecs and this is measured after compiling with MSVC 2010 based on the development branch.
For future reference, we can only offer good help, when we get simple self-contained little example. Given all the nifty functions provided in Eigen this is not really hard. The code above shows such an example and it will not cost much time to implement something like that. And of course, beforehand I was not asking for your whole application. In case you need more help, please let us know and we will answer when we find the time. Regards, Hauke |
Registered Member
|
Thanks bjacob and Hauke! I'll spend some more time trying to replicate the crasher in a self-contained example. If the above example doesn't crash, perhaps the crash has something to do with the fact that some of the matrices are wrapped around data in numpy arrays, or because the code is getting compiled into a python extension by boost-python. Cheers, Drew |
Moderator
|
I read some of your eigen objects are Map<> objects. There was a bug for Map<> and matrix products preventing some optimizations. I've just solved this issue, so it might be worth update your local copy and try again...
|
Moderator
|
some tips to reduce temporaries and improve performances:
lhs = b - A*x; -> (lhs = b).noalias() -= A*x; lhs = A.transpose()*(b - c); -> lhs.noalias() = A.transpose()*(b - c); lhs = (c - s*(b + G*a))); -> lhs = c - s*b; lhs.noalias() -= s*G*a; // this scalar product is free lhs += s*(y - A*x - e); -> lhs += s*(y-e); lhs.noalias() -= s*A*x; |
Registered users: Bing [Bot], Google [Bot], Sogou [Bot]