![]() Registered Member ![]()
|
Hi all,
While profiling my code I found a significant portion of time was being spent in the following function (where T = double and both matrices reside in either L2 or L3 cache):
Neither ICC, GCC or Clang produce particularly good code for the above function. I would therefore like to vectorize the outer (int k) loop as a means of improving the performance. However, I would like to do so in a way that avoids the allocation of temporaries and still permits the code to function when T is a non-standard type. Christoph on IRC suggested I investigate the packet interface -- although warned that it may be an implementation detail. Hence, I guess my question is: what options are open to me to improve the performance of the above? Regards, Freddie. |
Registered users: Baidu [Spider], Bing [Bot], Google [Bot], rblackwell