This forum has been archived. All content is frozen. Please use KDE Discuss instead.

Sparse Matrix Inversion at least 10 Times Slower than Matlab

Tags: None
(comma "," separated)
maverick197
Registered Member
Posts
2
Karma
0
Hi,

I'm trying to do sparse matrix inversion of a fairly large (but sparse) matrix and I am looking to write a code that is portable as possible and doesn't rely on my environment having certain packages installed, so Eigen seemed like a good option. Before someone makes the obligatory comment: YES! I absolutely must do an inversion, it is not the kind of problem that I can just solve a system of linear equation, I need the whole inverse.

I tried inverting a simple 5000 element matrix using: (dense) PartialPiv and SparseLU (using both the compute() and analyseFunction(), factorize() methods). I then did a simple inv(matrix) command in matlab and compared the two. Matlab is roughly 10x faster. Matlab is using the 8 cores of my computer for inversion and I suppose none of the Eigen methods are doing that. I'm not sure what I can do about that without assuming my computing environment has something like SuperLU installed. I am willing to use a dense solver if it were parallel but Eigen doesn't seem to support that either (as I said I tried PartialPiv).

Is there anything to be done about this? Or am I barking up the wrong tree with Eigen? I've included the bare bones code:

Code: Select all
#include <iostream>
#include <cstdlib>
#include <Eigen/Dense>
#include <Eigen/Sparse>
#include <math.h>
#include <complex>
#include <cmath>
#include <fstream>
#include <sstream>
#include <ctime>
//#include <Eigen/SuperLUSupport>



typedef Eigen::MatrixXcd matrix;
typedef Eigen::VectorXcd vector;
typedef Eigen::VectorXd vector_real;
typedef Eigen::MatrixXd matrix_real;
typedef Eigen::SparseMatrix<std::complex<double> > sparse_matrix;
typedef Eigen::SparseVector<std::complex<double> > sparse_vector;
typedef Eigen::SparseVector<double> sparse_vector_real;
typedef Eigen::SparseMatrix<double> sparse_matrix_real;
typedef Eigen::Triplet< std::complex<double> > triplet;
typedef std::vector< triplet > tripletList;


int main(int argc, char *argv[])
{

    std::cout.precision(15);
    Eigen::initParallel();
    double start_time;

    start_time=omp_get_wtime();


    int num_iterations=1;
    int total_size=5000;

    sparse_matrix big_matrix(total_size,total_size);

    tripletList bm_trip_list;
    bm_trip_list.reserve(4*total_size);
    for(int i=0;i<total_size;i++){
        bm_trip_list.push_back(triplet(i,i,std::complex<double>(-1.0,0.0)));
        if(i!=(total_size-1)){
            bm_trip_list.push_back(triplet(i+1,i,std::complex<double>(-3.0,0.0)));
            bm_trip_list.push_back(triplet(i,i+1,std::complex<double>(-3.0,0.0)));
        }
    }
    big_matrix.setFromTriplets(bm_trip_list.begin(),bm_trip_list.end());

    sparse_matrix pre_final_matrix(total_size,total_size);
   pre_final_matrix=big_matrix;
    pre_final_matrix.makeCompressed();
    matrix dense_pre_final=pre_final_matrix;

    std::cout << pre_final_matrix.nonZeros() << std:: endl;



    double temp_time;
    matrix answer;
    double time_dense_method=omp_get_wtime();

    for(int n=0;n<num_iterations;n++){
        Eigen::PartialPivLU< matrix > D_solver(dense_pre_final);
        answer=D_solver.inverse();
    }
    temp_time=omp_get_wtime();
    time_dense_method=temp_time-time_dense_method;
    std::cout << "Dense Matrix Calculation Complete." << std::endl << std::endl;

    sparse_matrix iden(total_size,total_size);
    iden.setIdentity();
    sparse_matrix sparse_answer(total_size,total_size);

    double time_compute_method=omp_get_wtime();
    for(int n=0;n<num_iterations;n++){
        sparse_answer.setZero();
        Eigen::SparseLU< sparse_matrix > G_solver;
        G_solver.compute(pre_final_matrix);
        sparse_answer=G_solver.solve(iden);
    }

    temp_time=omp_get_wtime();
    time_compute_method=temp_time-time_compute_method;

    matrix temp_dense_m1;
    temp_dense_m1=sparse_answer;

    std::cout << "Sparse (Compute Method) Matrix Calculation Complete." << std::endl << std::endl;

    double time_factorize_method=omp_get_wtime();

    for(int n=0;n<num_iterations;n++){
        sparse_answer.setZero();
        Eigen::SparseLU< sparse_matrix > G_solver;
        G_solver.analyzePattern(pre_final_matrix);
        G_solver.factorize(pre_final_matrix);
        sparse_answer=G_solver.solve(iden);

    }
    temp_time=omp_get_wtime();
    time_factorize_method=temp_time-time_factorize_method;

    matrix temp_dense_m2;
    temp_dense_m2=sparse_answer;

    std::cout << "Sparse (Factorize Method) Matrix Calculation Complete." << std::endl << std::endl;

    matrix final_mat;
    final_mat=(temp_dense_m2-temp_dense_m1)+(temp_dense_m2-answer)+(temp_dense_m1-answer);



    std::cout << "Dense: " << time_dense_method << std::endl << "Sparse Compute: " << time_compute_method << std::endl << "Sparse Factorize: " << time_factorize_method << std::endl << std::endl << "Final Norm = " << final_mat.norm() << std::endl;

}


User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
PartialPivLU should be able to exploit your multiple cores. make sure you compiled with -fopenmp, and that you run your program with OMP_NUM_THREADS=8 if 8 is the number of physical cores (not the number of hyper-threads).

Could you also paste the numbers you've got to see if they are reasonable.
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
btw, you are using an uninitialized matrix as big_matrix is not used at all. Moreover, since the inverse is most likely to be dense, you can use a dense solving step:

Code: Select all
matrix answer2, I;
I.setIdentity(total_size,total_size);
Eigen::SparseLU< sparse_matrix > G_solver(big_matrix);
answer2=G_solver.solve(I);


takes 2.2s here without multithreading.
maverick197
Registered Member
Posts
2
Karma
0
ggael wrote:btw, you are using an uninitialized matrix as big_matrix is not used at all. Moreover, since the inverse is most likely to be dense, you can use a dense solving step:

Code: Select all
matrix answer2, I;
I.setIdentity(total_size,total_size);
Eigen::SparseLU< sparse_matrix > G_solver(big_matrix);
answer2=G_solver.solve(I);


takes 2.2s here without multithreading.


Sorry, what I posted is a trimmed down version of a more complicated code where big_matrix is used and pre_final_matrix is actually the sum of a few other matrices including big_matrix. My bad.

Wow, the method you suggested works WAY, WAY better.

Using the first method (the dense one with PartialPivLU) I got times of approximately 87.4s
Using the second method (using SparseLU and compute) I got times of approximately 78.2s
Using the third method (using SparseLU and factorize) I got time of approximately 77.3s
And somehow your SparseLU and solve method takes approximately 1.3s

How is their such an enormous different between these methods?

Thanks for the help!
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
PartialPivLU and SparseLU are completely different algorithms and not comparable at all. compute is just a shortcut for analyzePattern/factorize, so same speed has to be expected. SparseLU::solve(sparse), i.e., with a sparse argument, assumes that the result will be sparse and thus processes the argument per chunk of 4 columns instead of solving for all columns at once as does SparseLU::solve(dense). This hardcoded value of 4 could be adjusted to be larger for small problems as yours.
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
hm, no. The reason is simply a problem of memory reallocation when filling the result matrix. The following is very good too:
Code: Select all
sparse_matrix answer, I;
I.setIdentity(total_size,total_size);
Eigen::SparseLU< sparse_matrix > G_solver(big_matrix);
answer.resize(total_size,total_size);
answer.reserve(total_size*total_size); // we know that the result will be dense
answer=G_solver.solve(I);


Bookmarks



Who is online

Registered users: Bing [Bot], Google [Bot], Yahoo [Bot]