This forum has been archived. All content is frozen. Please use KDE Discuss instead.

Parallel inversion of large matrices

Tags: None
(comma "," separated)
MarkoFranko
Registered Member
Posts
11
Karma
0
OS

Parallel inversion of large matrices

Sun Sep 18, 2011 12:01 pm
Hello!

I have in my code few extremely large matrices of dimension 2*120*100=24 000
(complex numbers)

And I need the inverse matrices of them.

LargeMatrix.inverse() takes tooo long.

Is it possible to parallelize this inversion with openMP?

thanks
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
First, are you sure you need the inverse ? if you do computation like "A^-1 * B" then rather do X = A.lu().solve(B) and you need A^-1 multiple time create a PartialPivLU<> object.

Second, a 24k^2 matrix is indeed pretty big for dense storage. Are you sure your matrix is not sparse, i.e., that it contains much more 0 entries than non zeros, like 3 to 20 non zeros per row or column ? If that's the case you should consider using a SparseMatrix<> with associated sparse solvers. Nevertheless, for dense objects you can enable parallelization by enabling OpenMP ( -fopenmp with GCC). This should speedup the LU factorization step.
MarkoFranko
Registered Member
Posts
11
Karma
0
OS
Hi ggael, thank you for your reply.

ggael wrote:First, are you sure you need the inverse ? if you do computation like "A^-1 * B" then rather do X = A.lu().solve(B) and you need A^-1 multiple time create a PartialPivLU<> object.


Unfortenately I need the whole inverse matrix. But one part of the code was optimized with your suggestion (X = A.lu().solve(B)).

ggael wrote:Second, a 24k^2 matrix is indeed pretty big for dense storage. Are you sure your matrix is not sparse, i.e., that it contains much more 0 entries than non zeros, like 3 to 20 non zeros per row or column ? If that's the case you should consider using a SparseMatrix<> with associated sparse solvers.


min 50% of the matrix are non zeros

ggael wrote: Nevertheless, for dense objects you can enable parallelization by enabling OpenMP ( -fopenmp with GCC). This should speedup the LU factorization step.


My for loops are parallelized with openmp

Code: Select all
#pragma omp parallel
{
   #pragma omp for
       for loop
 }


and it works quite well.

But when Itry

Code: Select all
#pragma omp parallel
{
   InverseOfLargeMatrix=LargeMatrix.inverse();
 }


then it takes longer to calculate the inverse than without openmp and I get wrong result.
Is it the wrong way?

Perhaps is it possible to split the matrix inversion in blocks and than to calculate them with a parallel loop?
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
simply compile with OpenMP enabled, do not add the "#pragma omp parallel " directive. Basically, the first step of the matrix inversion is to compute a LU factorization. For large matrices this is done per block and the bottleneck appears to be a matrix-matrix product. That is this later operation which is parallelized.
MarkoFranko
Registered Member
Posts
11
Karma
0
OS
ggael wrote:simply compile with OpenMP enabled, do not add the "#pragma omp parallel " directive. Basically, the first step of the matrix inversion is to compute a LU factorization. For large matrices this is done per block and the bottleneck appears to be a matrix-matrix product. That is this later operation which is parallelized.



I did it as you suggested, but I see only one cpu working when it comes to inversion of the matrix.

Is there some implementation like:

Code: Select all
MatrixA.inverse().block(a,b)

where the result would be a special block of the inverse of the matrix MatrixA?

Because than it would be possible to make a parallelized loop and to calculate all blocks of the inverse.

But I am not sure whether it is the correct way, when I look at this: http://en.wikipedia.org/wiki/Invertible_matrix#Blockwise_inversion
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
Parallelizing a matrix inversion is not trivial at all. As I told you the parallelization only happens during the LU decomposition step and not during the computation of U^-1 * L^-1 that probably explain why you don't notice it.


Bookmarks



Who is online

Registered users: Bing [Bot], Google [Bot], Yahoo [Bot]