Registered Member
|
Hello,
I have a strange problem with eigen3 (tried both beta1 and 2) on my Mac (OSX 10.6.6 with gcc 4.2.1). Despite the fact that I use .noalias(), a simple matrix multiplication creates three malloc's?! Here is my test program:
compiled with:
valgrind reports 4+loops*3 malloc's (i.e. 4 mallocs for loops=0 and 34 mallocs for loops=10). Without the .noalias(), valgrind reports 4+loops*4 malloc's, so noalias does something, but obviously not the whole thing? Under Linux with gcc 4.4.3 valgrind reports only 3 mallocs with noalias, even for loops > 0 as expected... Switching to a newer gcc is not an (easy) option for us right now. Any ideas what's going on here? Thanks, Markus |
Moderator
|
Hi,
this is perfectly normal. The matrix product has to allocate 3 small workspace buffers. On linux and for not too large matrices they are allocated on the stack. If you specify the matrix size at compile time, then stack allocation is guaranteed. In the future we could add a special product function where the user will be able to specify a pre-allocated workspace object. |
Registered Member
|
Ggael,
thanks a lot for your quick reply. Indeed, for square matrices > 50x50 I also get mallocs on linux, but only 3+loops*2 (?). Is there a way to force the stack allocation also on Macs for dynamic matrix products? We would like to use eigen in a real time control application and have to avoid mallocs, but our dynamic matrices are < 50x50, so the 'Linux-behavior' would be fine for us. BTW: A special product function with a user provided pre-allocated workspace would be nice. Thanks again, Markus |
Registered Member
|
To answer my own question:
OSX also has alloca (at least since 10.3), so after changing line 431 in Core/util/Memory.h from:
to:
I get the same behavior on my Mac as on my Linux box. Could that be included in Eigen? Sadly this fixes only the issue with malloc's for matrix multiplications, not for other temporaries created in methods like inverse, llt, etc. Did somebody look into writing a private allocator for Eigen? This looks easier to me than creating versions of all methods with user supplied temporaries to avoid memory fragmentation/the performance hit of repeated mallocs/frees? Thanks, Markus |
Moderator
|
for LU or Cholesky solving, you can already preallocated a PartialPivLU or LLT object with appropriate sizes and use these objects all over the places (you need one per thread). Of course this only works if the sizes does not changes.
|
Registered users: bartoloni, Bing [Bot], Evergrowing, Google [Bot], ourcraft