Registered Member
|
Hi all,
I'm trying to optimize a Matrix * Vector multiplication. What I have originally looks something like:
What I'm trying to do is something like:
where float3 is a struct containing 3 floats, and with all the relevant operators overridden. My reasoning is that it should use SSE / AVX instructions when multiplying 3 floats by 1 float. If the code already uses SSE instructions, I'd like to see the performance difference when using this approach. I'm currently getting errors about conj_helper not being defined (in gebp_kernel::operator() in Core\products\GeneralBlockPanelKernel.h). I'd really appreciate it if anyone could explain the high-level architecture in Core\arch\SSE\Complex.h and Core\util\BlasUtil.h (and whatever else needs to be modified). Off topic - I have a small optimization for SparseMatrix::setFromTriplets() if anyone is interested (no custom instructions, just code re-arranging). |
Moderator
|
I'm confused because your code is showing sparse matrices while the reported error concerns dense matrix product...
|
Registered users: Bing [Bot], Google [Bot], Yahoo [Bot]