Matrices for an OpenCL-like float3 struct

Thu Oct 24, 2013 6:10 pm

Hi all,

I'm trying to optimize a Matrix * Vector multiplication. What I have originally looks something like:

What I'm trying to do is something like:

where float3 is a struct containing 3 floats, and with all the relevant operators overridden. My reasoning is that it should use SSE / AVX instructions when multiplying 3 floats by 1 float. If the code already uses SSE instructions, I'd like to see the performance difference when using this approach.

I'm currently getting errors about conj_helper not being defined (in gebp_kernel::operator() in Core\products\GeneralBlockPanelKernel.h). I'd really appreciate it if anyone could explain the high-level architecture in Core\arch\SSE\Complex.h and Core\util\BlasUtil.h (and whatever else needs to be modified).

Off topic - I have a small optimization for SparseMatrix::setFromTriplets() if anyone is interested (no custom instructions, just code re-arranging).

Matrices for an OpenCL-like float3 struct

Page 1 of 1 (2 posts)

Matrices for an OpenCL-like float3 struct

Re: Matrices for an OpenCL-like float3 struct

Bookmarks

Who is online