Registered Member
|
I'm curious about two things:
How does Eigen manage to match the speed of vendor BLAS (MKL, ACML) in SGEMM, and be considerably faster than ATLAS, while not even requiring local tuning? What does it do differently than Blitz++ to get reasonable compilation times? |
Moderator
|
For matrix products we have implemented our own finely tuned kernel that is inspired from Goto's paper. Block-size parameters are estimated at runtime from the actual size of the caches, that is why you don't need local tuning.
Regarding compilation time, I cannot tell much since I don't know how Blitz++ is implemented. However, we do care about compilation time and we tried to reduce as most as possible the number of abstraction layers. Also note that Blitz is a tensor library while Eigen focuses on 2D matrices. |
Registered users: Baidu [Spider], Bing [Bot], Google [Bot], Yahoo [Bot]