Registered Member
|
Hello,
First, thanks to all Eigen team for the great work. I've encountered a performance issue doing a simple outer product of vectors (M = V1 * V2.transpose();) It's a slower than a hand made loop ! We can make it almost twice faster by explicitly iterating on V2 elements (thus having vector*scalar operation for each col of M). Vectorization issue ? I've searched the forum for this specific issue, without success. Tested with Eigen 3.1, gcc 4.4.5, gcc 4.7.1 with options :
Results : | *Dim1* | *Dim2* | *nbIter* | *By hand* | *Eigen* | *Eigen fixed* | | 1024 | 50 | 3906 | 77 ms | 116 ms | 62 ms | From this code :
Thanks, Yves |
Moderator
|
your "OuterProduct_Eigen_Fix" version is how it is implemented. There might be a recent performance regression here. I'll check.
|
Moderator
|
alright, the issue is that the code is optimized to perform rank-1 updates:
A += v1 * v2.transpose(); If you adjust your OuterProduct_Eigen_Fix to do this operation, then you get the exact same performance. Here the overhead for a simple A = v1 * v2.transpose(); is about 30%, and it is still quite faster than OuterProduct_Ref. |
Registered Member
|
Interesting !
It's true that for update, Eigen and EigenFix give the same performance, which is 1.7x better than hand-made update. For assignation, I still have to use the "fix" to obtain the best performance, which remains very close to hand-made version. I've compared 3 operations (assignation, assignation simulated with SetZero() and +=, and update). BTW, noalias() helps here. Byhand = : 77ms Byhand += : 130ms Eigen = (operator =) : 116ms Eigen = (SetZero then +=) : 115ms Eigen += : 77ms EigenFix = (operator =) : 62ms EigenFix = (SetZero then +=) : 115ms (no difference with plain Eigen expression) EigenFix += : 77ms (no difference with plain Eigen expression) FYI, /proc/cpuinfo gives :
Yves |
Moderator
|
yes that's the expect behavior of the current implementation. I've added an entry there:
http://eigen.tuxfamily.org/bz/show_bug.cgi?id=483 so that we don't forget to optimize this case. |
Registered users: Baidu [Spider], Bing [Bot], Google [Bot]