Registered Member
|
I am trying to optimize my program, and noticed that it seems like .block() introduces significant overhead for vector-matrix multiplications.
I isolated it down to this case (m is a matrix, v and result are row vectors - all dimensions are dynamic except for vector rows) -
vs
The second case seems to take about twice as long as the first, even though block is supposed to not have any overhead. Does using block change the way prefetching is done, for example? Assuming the bottleneck in vector-matrix multiplications is memory? In my application I am obviously not using block this way, but I'm seeing similar slowdown (~2x) in my application, and I'm hoping the cause is the same. I am using GCC 4.9 on OSX, with -O3 -march=native (including AVX), Haswell CPU, using Hg head current as of about two weeks ago. Thanks! Matthew |
Moderator
|
That's quite surprising. I guess that what happens is that v.block(0, 0, 1, v.cols()) looses the information that this is a vector at compile time, and therefore operator* falls back to a general matrix-matrix operation. To work on subvectors, use v.segment(start,length) or head/tail methods.
|
Registered Member
|
Thanks! That worked! Now it's just as fast. |
Registered users: bartoloni, Bing [Bot], Google [Bot], Yahoo [Bot]