Registered Member
|
Hello!
I have in my code few extremely large matrices of dimension 2*120*100=24 000 (complex numbers) And I need the inverse matrices of them. LargeMatrix.inverse() takes tooo long. Is it possible to parallelize this inversion with openMP? thanks |
Moderator
|
First, are you sure you need the inverse ? if you do computation like "A^-1 * B" then rather do X = A.lu().solve(B) and you need A^-1 multiple time create a PartialPivLU<> object.
Second, a 24k^2 matrix is indeed pretty big for dense storage. Are you sure your matrix is not sparse, i.e., that it contains much more 0 entries than non zeros, like 3 to 20 non zeros per row or column ? If that's the case you should consider using a SparseMatrix<> with associated sparse solvers. Nevertheless, for dense objects you can enable parallelization by enabling OpenMP ( -fopenmp with GCC). This should speedup the LU factorization step. |
Registered Member
|
Hi ggael, thank you for your reply.
Unfortenately I need the whole inverse matrix. But one part of the code was optimized with your suggestion (X = A.lu().solve(B)).
min 50% of the matrix are non zeros
My for loops are parallelized with openmp
and it works quite well. But when Itry
then it takes longer to calculate the inverse than without openmp and I get wrong result. Is it the wrong way? Perhaps is it possible to split the matrix inversion in blocks and than to calculate them with a parallel loop? |
Moderator
|
simply compile with OpenMP enabled, do not add the "#pragma omp parallel " directive. Basically, the first step of the matrix inversion is to compute a LU factorization. For large matrices this is done per block and the bottleneck appears to be a matrix-matrix product. That is this later operation which is parallelized.
|
Registered Member
|
I did it as you suggested, but I see only one cpu working when it comes to inversion of the matrix. Is there some implementation like:
where the result would be a special block of the inverse of the matrix MatrixA? Because than it would be possible to make a parallelized loop and to calculate all blocks of the inverse. But I am not sure whether it is the correct way, when I look at this: http://en.wikipedia.org/wiki/Invertible_matrix#Blockwise_inversion |
Moderator
|
Parallelizing a matrix inversion is not trivial at all. As I told you the parallelization only happens during the LU decomposition step and not during the computation of U^-1 * L^-1 that probably explain why you don't notice it.
|
Registered users: Bing [Bot], Google [Bot], Yahoo [Bot]