Sparse matrix-vector product is slow

Board index

Page 1 of 1 (7 posts)

Tags:

munyanko Registered Member Posts 3 Karma 0 OS	Sparse matrix-vector product is slow Thu Aug 28, 2014 12:14 pm I've been using Eigen to compute product of many sparse matrices and was very pleased with perfomance. As half of the matrices in the product are essentially exponentials of another matrix its not efficient to exponentiate directly (as sizes grow up to 2^10-2^14), but rather to use Krylov-like methods to compute exp(tH)v so that pruduct slims down to many sparse matrix sparse vector products over selected columns. However perfomance of sparse matrix sparse vector product is at least 10 times slower that of direct matrix-matrix product. One would expect similar perfomance (+some overhead) of both algorithms, as is the case with gmm. Code: Select all `mm gmm took 410 mm eigen took 499 mv gmm took 293 mv eigen took 3866` I wrote small test case to bench both algorithms with both libraries (output of its result is above). I do use NDEBUG and native optimisations but results are disastrous $:-\$ Code: Select all #include <iostream> #include <chrono> #include <gmm/gmm.h> #include <Eigen/Sparse> void eigen_to_gmm(const Eigen::SparseMatrix<double> &e, gmm::col_matrix<gmm::wsvector<double>> &g) { gmm::clear(g); gmm::resize(g, e.cols(), e.rows()); for(int k = 0; k < e.outerSize(); ++k) for(Eigen::SparseMatrix<double>::InnerIterator it(e,k); it; ++it) { g(it.row(), it.col()) = it.value(); } } void eigen_to_gmm(const Eigen::SparseMatrix<double> &e, gmm::col_matrix<gmm::rsvector<double>> &g) { gmm::clear(g); gmm::resize(g, e.cols(), e.rows()); for(int k = 0; k < e.outerSize(); ++k) for(Eigen::SparseMatrix<double>::InnerIterator it(e,k); it; ++it) { g(it.row(), it.col()) = it.value(); } } int main() { Eigen::SparseMatrix<double> exph(16,16); Eigen::SparseMatrix<double> opc(16,16); Eigen::SparseMatrix<double> opa(16,16); exph.insert(0,0) = 1; exph.insert(1,1) = 1.00014; exph.insert(2,2) = 1.00014; exph.insert(3,3) = 1.00014; exph.insert(4,4) = 1.00014; exph.insert(5,5) = 1.00007; exph.insert(6,6) = 0.999984; exph.insert(9,6) = -2.99995e-05; exph.insert(7,7) = 1.00004; exph.insert(8,7) = -3.00013e-05; exph.insert(7,8) = -3.00013e-05; exph.insert(8,8) = 1.00004; exph.insert(6,9) = -2.99995e-05; exph.insert(9,9) = 0.999984; exph.insert(10,10) = 1.00007; exph.insert(11,11) = 0.999676; exph.insert(12,12) = 0.999676; exph.insert(13,13) = 0.999676; exph.insert(14,14) = 0.999676; exph.insert(15,15) = 0.999068; opa.insert(0,1) = 1; opa.insert(2,5) = 1; opa.insert(3,6) = 1; opa.insert(4,8) = 1; opa.insert(7,11) = 1; opa.insert(9,12) = 1; opa.insert(10,13) = 1; opa.insert(14,15) = 1; opc.insert(1,0) = 1; opc.insert(5,2) = 1; opc.insert(6,3) = 1; opc.insert(8,4) = 1; opc.insert(11,7) = 1; opc.insert(12,9) = 1; opc.insert(13,10) = 1; opc.insert(15,14) = 1; gmm::col_matrix<gmm::rsvector<double>> exph_g; eigen_to_gmm(exph, exph_g); gmm::col_matrix<gmm::rsvector<double>> opa_g; gmm::col_matrix<gmm::rsvector<double>> opc_g; eigen_to_gmm(opa, opa_g); eigen_to_gmm(opc, opc_g); Eigen::SparseMatrix<double> iden(exph.rows(), exph.cols()); for(int i = 0; i < iden.rows(); i++) iden.insert(i,i) = 1; gmm::col_matrix<gmm::rsvector<double>> iden_g; eigen_to_gmm(iden, iden_g); gmm::col_matrix<gmm::wsvector<double>> prod_g(iden.cols(), iden.rows()); gmm::copy(iden_g, prod_g); gmm::col_matrix<gmm::wsvector<double>> tmp_g(exph.rows(), exph.cols()); auto begin = std::chrono::high_resolution_clock::now(); for(int i = 0; i < 100000; i++) { gmm::mult(prod_g, exph_g, tmp_g); gmm::mult(tmp_g, opa_g, prod_g); gmm::mult(prod_g, exph_g, tmp_g); gmm::mult(tmp_g, opc_g, prod_g); } gmm::mult(prod_g, exph_g, tmp_g); auto end = std::chrono::high_resolution_clock::now(); std::cout << "mm gmm took " << std::chrono::duration_cast<std::chrono::milliseconds>(end-begin).count() << std::endl; Eigen::SparseMatrix<double> prod(iden); begin = std::chrono::high_resolution_clock::now(); for(int i = 0; i < 100000; i++) { prod = prod * exph; prod = prod * opa; prod = prod * exph; prod = prod * opc; } prod = prod * exph; end = std::chrono::high_resolution_clock::now(); std::cout << "mm eigen took " << std::chrono::duration_cast<std::chrono::milliseconds>(end-begin).count() << std::endl; gmm::wsvector<double> v(exph.rows()); gmm::wsvector<double> y(exph.rows()); gmm::col_matrix<gmm::wsvector<double>> prod_g2(iden.cols(), iden.rows()); begin = std::chrono::high_resolution_clock::now(); for(int state = 0; state < exph.cols(); state++) { gmm::copy(iden_g.col(state), v); for(int i = 0; i < 100000; i++) { gmm::mult(exph_g, v, y); gmm::mult(opa_g, y, v); gmm::mult(exph_g, v, y); gmm::mult(opc_g, y, v); } gmm::mult(exph_g, v, y); prod_g2.col(state) = y; } end = std::chrono::high_resolution_clock::now(); std::cout << "mv gmm took " << std::chrono::duration_cast<std::chrono::milliseconds>(end-begin).count() << std::endl; Eigen::SparseVector<double> vec1(exph.rows()); Eigen::SparseVector<double> vec2(exph.rows()); Eigen::SparseMatrix<double> prod2(iden); begin = std::chrono::high_resolution_clock::now(); for(int state = 0; state < exph.cols(); state++) { vec1 = iden.col(state); for(int i = 0; i < 100000; i++) { vec2 = exph * vec1; vec1 = opa * vec2; vec2 = exph * vec1; vec1 = opc * vec2; } prod2.col(state) = exph * vec1; } end = std::chrono::high_resolution_clock::now(); std::cout << "mv eigen took " << std::chrono::duration_cast<std::chrono::milliseconds>(end-begin).count() << std::endl; return 0; }
ggael Moderator Posts 3447 Karma 19 OS	Re: Sparse matrix-vector product is slow Fri Aug 29, 2014 6:39 am I guess that your example is not representative as using sparse representation for that small matrices is not recommended. What are the typical sizes of your real-world use case? Are you sure that the result remains sparse?
munyanko Registered Member Posts 3 Karma 0 OS	Re: Sparse matrix-vector product is slow Fri Aug 29, 2014 9:34 am ggael wrote:I guess that your example is not representative as using sparse representation for that small matrices is not recommended. What are the typical sizes of your real-world use case? Are you sure that the result remains sparse? Matrices are generally very sparse due to their nature: H is interaction hamiltonian which is mostly diagonal in occupation basis and op are fermionic operators. And matrix product is always sparse by the phisycal nature of the problem, but here I ran test case with 1024x1024 matrices (limitied to 1000 products): Code: Select all `mm gmm took 9994 mm eigen took 3939 sparsity: product 99.6267% exph 99.5895% op 99.9512% mv gmm took 9088 mv eigen took 150801 average vector product sparsity 99.7234` In direct matrix-matrix product Eigen to my pleasure overtook gmm, but same product computed matrix by column yeilds abyssmal time for Eigen. To make the matters worse as I have to compute the following product in real case millions of times per iteration where n could go up to 1000: Code: Select all `exp(t_2nH)Op(t_2n)exp(t_2n-1H)Op(t_2n-1) ... * exp(t_1H)` Where exponential argument is variable I have to compute exp everytime and Taylor or Pade are not optimal for such large matrices. Thus I compute exp(tH)*v instead via Newton-Leja algorithm but it has dozens matrix-vector products inside and this is where I noticed major slowdown for the first time. As I first tried matrix-vector product for the first time in 3.2.1 or 3.2.2 and first my thought was that this is a regression in Eigen. Now I'm faced with a choice of either introducing yet another matrix library to the project (gmm) or optimizing matrix-vector product if I did something wrong here (if I did), or simply wantiing for a fix if this is a regression, thus I brought the matter here.
ggael Moderator Posts 3447 Karma 19 OS	Re: Sparse matrix-vector product is slow Fri Aug 29, 2014 9:55 am Can you try with the devel branch, I've just committed a fix.
ggael Moderator Posts 3447 Karma 19 OS	Re: Sparse matrix-vector product is slow Fri Aug 29, 2014 9:57 am Moreover, it might also help to call prod2.reserve(prod2.cols() * approximate_nnz_per_column) before starting the vector by vector computation to prevent from multiple memory reallocation and copies when inserting the column vectors.
munyanko Registered Member Posts 3 Karma 0 OS	Re: Sparse matrix-vector product is slow Fri Aug 29, 2014 10:45 am Thank you for the fix! Code: Select all `mm gmm took 10786 mm eigen took 4509 mv gmm took 9299 mv eigen took 35470` Not 15 times slower as before but only 4 times now for matrix-vector product! Moreover reservations for vec1, vec2 and prod2 allowed to shrug off 3 seconds. However it is still very suspicios, as when I make H fully diagonal, thus making all vectors and columns in byproducts have only one element things are still off: Code: Select all `mm gmm took 1591 mm eigen took 1820 mv gmm took 1388 mv eigen took 26533` On a side note: matrix-matrix product is slower by 15% in dev version
ggael Moderator Posts 3447 Karma 19 OS	Re: Sparse matrix-vector product is slow Fri Aug 29, 2014 12:23 pm yes, this slowdown in this extreme case is due to the fact that by default we are assuming aliasing and thus each product creates a temporary. In general this is not an issue because the cost of this dynamically allocated temporary is small compared to the cost of the product, but in your extreme case this is killing the performance. This will be fixed soon through the use of noalias (as with dense product). In the meantime you can update your copy of the devel branch and directly call some internal functions to by-pass this temporary: Code: Select all `template<typename Lhs, typename Rhs, typename Res> void mul_colmajor_in_place(const Lhs &lhs, const Rhs &rhs, Res &res) { res.resize(lhs.rows(), rhs.cols()); res.setZero(); Eigen::internal::conservative_sparse_sparse_product_impl<Lhs,Rhs,Res>(lhs, rhs, res, true); } // ... mul_colmajor_in_place(A, vec1, vec2); mul_colmajor_in_place(A, vec2, vec1);` and for larger matrices you may also need to increase the stack allocation limit by compiling with, e.g.: -DEIGEN_STACK_ALLOCATION_LIMIT=1000000