Registered Member
|
Dear guys.
The problem I am working on has (let's say) 3 arrays of some length (arr1, arr2, arr3), some scalars (s1, s2, s3) and the objective is to calculate s1*arr1+s2*arr2+s3*arr3. When I do this in C without eigen, I get approximately 60 percent faster code than when I use eigen with sse2 (measured via runtime in VTune). What can be wrong? The sample code I use is
any hint is much appreciated. Daniel |
Moderator
|
first make sure to compile with -O2 -DNDEBUG, second if MULTIS is known at compile time and small enough you should really write:
res = multis[0] * arrays[0] + multis[1] * arrays[1] + multis[2] * arrays[2] + multis[3] * arrays[3]; otherwise you cannot get advantage of expression template and thus performs much more memory loads and stores to the res Array. |
Registered users: Bing [Bot], Google [Bot], Yahoo [Bot]