Registered Member
|
Hi,
I didn't want to take this thread over, so I started a new one. I'm having a piece of code like this:
And OpenMP is set as CXXFLAG. My operation should not differ from a matrix-matrix multiplication (from eigen's internal point of view). Am I right? But nevertheless only one cpu is used to perform this operation. Do you have an idea why? Cheers, Manuel PS: Currently, I have no access to the source code/Makefile. But I can post it at Monday. EDIT: Matrix indices |
Moderator
|
The internal implementation of matrix-matrix and matrix-vector products are very very different, and so far only the former is multithreaded. Before looking for a multithreaded mattrix-vector product, you should rather see if your algorithm cannot be implemented in term of matrix-matrix products via a blocking strategy. This is more work, but the performance gain will be much higher.
Now regarding your use case, two temporaries are created. One for (Matrix1.col(i) - Vector) and one to hold the result of the product. The later can be avoided as follow: mat1.Col(i+1) += mat1.Col(i); mat1.Col(i+1).noalias() -= mat2*(Matrix1.col(i) - Vector); |
Registered Member
|
Thanks for your hint to split the operation into two parts!
I tried to find out what you meant by replacing my matrix-vector multiplication with a matrix-matrix multiplication using a blocking strategy but I could not find any good information about it. Do you mean I should slice the vector into columns of a matrix and after the matrix-matrix multiplication I should sum up the rows of the result? EDIT: And if I should do that, what's the best method to "reinterpret" the size of the vector. Something like
? |
Moderator
|
no no, actually I've no idea about your algorithm, perhaps what I'm saying is not possible in your case. But I see you have a loop on the column of a matrix, so perhaps it is possible to treat N columns at once + some extra work just like it is done for all block decompositions: Cholesky, LU, QR, tridiagonalisation, etc. Just an idea....
|
Registered Member
|
Ah, ok. No, I cannot do this because the values of columns of Matrix1 depend recursively on each other.
Actually, I have this equation:
I just use a matrix instead of vec_old, vec_new respectively, to keep a history of the values of this vector. But mat is very sparse, the vectors are dense. So do you think I would be a good idea to do what I said: Slice the vec_old into a matrix (just a mapping, not really copying) and then do a matrix-matrix multiplication. Finally sum up the rows of the result. (If I swap the matrices of the matrix-matrix multiplication: sum up colwise). Something like
Eventually this could have two advantages: 1) the matrix-matrix product could be calculated using OpenMP. 2) if I choose the correct BLOCKSIZE accordingly to my cache size, the dense rows of VectorAsMatrix would be kept in the cache, and only sparse columns of mat must be requested from the memory. Would this be a good idea or is that overingeneering? Cheers and thanks for your great hints! Manuel |
Registered users: Bing [Bot], Evergrowing, Google [Bot], rockscient