Registered Member
|
Hi,
I am running a sparse diagonalization routine (ARPACK++) whose only input is a function that calculates the matrix times a vector. This is working beautifully. I recently was introduced to Eigen, and since most of the computation time comes from that matrix-vector product I thought the vectorization done by Eigen could make this calculation much faster. However, after setting this up, I see a negligible effect, if any. Either a) my compiler was already vectorizing the code, or b) Eigen isn't. I compile with g++ 4.2 on Mac OS X 10.6. I have checked that EIGEN_VECTORIZE is defined, and I am compiling with the -msse2 flag. One concern is that, for reasons I don't want to get into, I have to compile everything in 32 bit architecture using the -m32 flag. Does that make a difference? As per the Eigen FAQ, I inserted assembler print statements and, using the -S flag, printed out the assembler code... but I have never looked at assembler code in my life and have no idea how to tell if this is vectorized. Here is the line that I am trying to vectorize (where w and v are standard arrays of size n and Msparse is an n by n SparseMatrix that has already been initialized):
And here is the associated assembly code (sorry it is so long):
Can anyone please tell me if this is vectorized? Thank you in advance! -Carl |
Moderator
|
Not all operations are vectorized. Typically sparse matrix ops are not because that's not possible in general.
If your matrix has a special structure that would allow to form blocks of at least 2 consecutive double without introducing too many explicit zeros, then vectorization could theoretically be enabled, but such cases are pretty rare in practice. |
Registered Member
|
Thank you ggael, that is very helpful. My system actually is one of those situations (it contains either 2 by 2 or 3 by 3 blocks of non-zero elements). The way I have done the multiplication in the past is save these blocks as small 2 by 2 arrays and then do a bunch of (2by2)*(2) multiplications. It sounds like this could be sped up by using Matrix2d's instead. It sounds, though, like this might not work when the blocks are 3 by 3. When you do a (3by3)*(3) operations, does it vectorize two of the three double*double operations that go into each row*vector operation, leading to a possible 33% speedup? What if I use MatrixXd(3,3) instead of Matrix3d?
Thanks again for you help. -Carl |
Moderator
|
Yes using a SparseMatrix of Matrix2d and a vector of Vector2d for the rhs could work (never tried) and be fully vectorized. On the other hand, 3x3 matrices are not vectorized at all because of the unaligned loads overheads it would imply. A MatrixXd(3,3) would be vectorized but it would also be much slower.
|
Registered Member
|
Ok, that makes sense.
Thanks for the help. |
Registered users: Bing [Bot], Google [Bot], q.ignora, watchstar