Registered Member
|
Hi all
I've seen some comments in this forum and benchmarks showing the gain of Eigen over hand-written code for Matrix Multiplication when the size of the involved matrices is *small* - say in the range [3x3, 30x30], possibly with non-square matrices. In my experience this is true if we compare against GCC (-O3, with and w/o -mavx or -msse4.2), but not if we compare to ICC (-O3 -xAVX/-xSSE4.2). "Surprisingly" (ok, I'm not entirely sure about being surprising or not), the performance is even worse if we have a chain of matrix multiplications like A += B*C + D*E + F*G. I'm using noalias() for the destination matrix and all of the involved matrices are of fixed size. For example, this code:
runs *slower* than (here one operand in each matrix multiply is transposed w.r.t. to the previous case, but it doesn't make any difference):
The best (meaning the fastest) Eigen code was obtained compiling it with gcc 4.7 -O3, whereas the best hand-written C code was compiled with ICC 14 -O3. The code was run in an outer loop several times, leading to: Runtime Eigen = 1.812738 s Runtime Hand-written = 1.329352 s Everywhere I read that Eigen is optimal especially for small matrices, but here I see that straight ICC compilation pays off. OK, I haven't tried all of the possible sizes, but before spending hours coding various examples I ask: - Am I missing something important? - Are there benchmarks that compare, in an exhaustive way, eigen and hand-written code for small sizes? - And most importantly, for which range of sizes do you think that Eigen >> best-possible-hand-written-code and still Eigen >> say MKL ? Thanks for considering this long question -- Fabio
Last edited by fabiol on Mon May 19, 2014 9:41 pm, edited 1 time in total.
|
Moderator
|
Your comparison is unfair because in one case you're telling that the buffers are aligned, and not in the other case. Use Map<Matrix<double, 10, 14, RowMajor>, Aligned> to tell Eigen that you're pointers are aligned. The devel branch should also be significantly faster if you enable AVX.
|
Moderator
|
BTW, you wrote:
from which I understand that Eigen is already faster. |
Registered Member
|
I meant the other way round, I've edited the original post. Adding Aligned doesn't make any difference: is it because on a sandy bridge, *if* I remember correctly, the cost of a movups is identical to that of movaps if data are actually aligned? I understood AVX support was merged into trunk long time ago and then included in the latest release, but now you say that I should switch to the devel branch to use it, so I guess I was just wrong, right? Also, you say I have to "enable AVX". How? In any case, I guess I have to add two zero-columns to my matrices so as to have the length of each row (12) a multiple of the vector length (4). Thanks for your support -- Fabio |
Registered users: Baidu [Spider], Bing [Bot], Google [Bot], rblackwell