Reply to topic

Eigen really slow !?...

fhoussen
Registered Member
Posts
2
Karma
0

Eigen really slow !?...

Sun Oct 01, 2017 2:48 pm
I tried to run a very simple (sparse) matrix vector product (V0) on my laptop but it's very very slow ?... A naive implementation (V1) is very very much faster :
>> ./matVecProd.sh
g++ -I /home/fghoussen/Documents/INRIA/eigen-eigen-5a0156e40feb/local/include/eigen3 -mavx -march=native -O3 -o matVecProdV0.exe matVecProdV0.cpp
g++ -march=native -O3 -ftree-vectorize -fopt-info-optall -funroll-loops -ffast-math -fstrict-aliasing -o matVecProdV1.exe matVecProdV1.cpp
matVecProdV1.cpp:43:28: note: loop unrolled 7 times
matVecProdV0 : movpd 0, addpd 2, mulpd 0, time 136911 ms, KO
matVecProdV1 : movpd 0, addpd 0, mulpd 0, time 495 ms, KO

This appends both when I compile with -mavx or -msse2
Can somebody help me to understand why ?

My laptop has 4 procs (2 cores + hyperthreading).
>> cat /proc/cpuinfo
model name : Intel(R) Core(TM) i7-3687U CPU @ 2.10GHz
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm cpuid_fault epb tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts

Franck

Files are available here : https://filesender.renater.fr/?s=downlo ... dca165dd22
User avatar ggael
Moderator
Posts
3428
Karma
19
OS

Re: Eigen really slow !?...

Mon Oct 02, 2017 8:15 am
but you code is plain wrong and does not compute anything ! The problem is that pMatIr[i]==p for all i>0 and thus pMatIr[i+1]-pMatIr[i]==0 and the line pRes[i] += ...; is never called.

Then this += line is also wrong, it should be:

pRes[i] += pMatVal[pMatJc[startJc+j]]*pVec[pMatJc[startJc+j]];

and not:

pRes[i] += pMatVal[pMatJc[startJc+j]]*pVec[i];


finally, the checks are wrong too, you should replace n by p:

int rc = 0;
for (size_t i = 0; i < n; i++)
if (abs(pRes[i] - AVE*p) > 1e-12) {rc = 1; break;} // Prevent compiler optimizations.
cout << ((rc == 0) ? ", OK" : ", KO") << endl;

and finally, if you want more performance for sparse matrix-vector products uses a row-major matrix:

SparseMatrix<double,RowMajor>

for which you can even enable multi-threading if compiling with -fopenmp.

And generally, it is not a good idea to bench code written within the main function, better wrap it within a non-inline function to be closer to real-world usage.
fhoussen
Registered Member
Posts
2
Karma
0

Re: Eigen really slow !?...

Mon Oct 02, 2017 11:07 am
Just saw your reply here... Thanks, you are right.
Even with the fix, I still the same kind of problem.
>>matVecProdV0 : movpd 0, addpd 31, mulpd 0, time 136682 ms, OK
>>matVecProdV1 : movpd 0, addpd 0, mulpd 0, time 64635 ms, OK

Franck
User avatar ggael
Moderator
Posts
3428
Karma
19
OS

Re: Eigen really slow !?...

Mon Oct 02, 2017 4:04 pm
here I get same speed if just fixing the code, then Eigen's about x1.8 faster if using SparseMatrix<double,RowMajor>, and finally x5 faster if using RowMajor + "-fopenmp".
User avatar ggael
Moderator
Posts
3428
Karma
19
OS

Re: Eigen really slow !?...

Mon Oct 02, 2017 4:06 pm
V0:
Code: Select all
#include <iostream>
#include <string>
#include <sstream>
#include <chrono>
#include <cmath>
#include <cstdlib> // rand.
#include <Eigen/Sparse>
#include <vector>

using namespace std;
using namespace Eigen;

#define AVE 7500

EIGEN_DONT_INLINE


int main(int argc, char ** argv) {
  if (argc != 3 || !argv) return 1;

  size_t n = 0; stringstream sn(argv[1]); sn >> n; if (n <= 0)          return 1;
  size_t p = 0; stringstream sp(argv[2]); sp >> p; if (p <= 0 || p > n) return 1;

  vector<Eigen::Triplet<double>> ijAij;
  ijAij.reserve(n*p); // We have p values per row.
  for (size_t i = 0; i < n; i++) {
    for (size_t j = 0; j < p; j++) { // We have p values per row.
      ijAij.push_back(Eigen::Triplet<double> (i, rand()%n, 1.)); // Get column in the range 0 to n-1.
    }
  }
  Eigen::SparseMatrix<double,RowMajor> mat(n, n);
  mat.reserve(n*p); // We have p values per row.
  mat.setFromTriplets(ijAij.begin(), ijAij.end());

  Eigen::VectorXd vec(n);
  for (size_t i = 0; i < n; i++) vec(i) = 1.;

  Eigen::VectorXd res(n);
  for (size_t i = 0; i < n; i++) res(i) = 0.;

  auto start = chrono::high_resolution_clock::now();
  for (size_t a = 0; a < AVE; a++) { // Average.
    res += mat*vec;
  }
  auto end = chrono::high_resolution_clock::now();
  cout << double(chrono::duration_cast<chrono::milliseconds>(end-start).count())/double(AVE) << " ms" << flush;

  int rc = 0;
  for (size_t i = 0; i < n; i++)
    if (abs(res(i) - AVE*p) > 1e-12) {rc = 1; break;} // Prevent compiler optimizations.
  cout << ((rc == 0) ? ", OK" : ", KO") << "\n";

  return rc;
}

V1:
Code: Select all
#include <iostream>
#include <string>
#include <sstream>
#include <chrono>
#include <cmath>
#include <cstdlib> // rand.

using namespace std;

#define AVE 7500

int main(int argc, char ** argv) {
  if (argc != 3 || !argv) return 1;

  size_t n = 0; stringstream sn(argv[1]); sn >> n; if (n <= 0)          return 1;
  size_t p = 0; stringstream sp(argv[2]); sp >> p; if (p <= 0 || p > n) return 1;

  int * pMatIr = new int[n+1]; pMatIr[0] = 0;
  int nnz = n*p; // Number of non null values: p values per row * n rows.
  int * pMatJc = new int[nnz];
  double * pMatVal = new double[nnz];
  size_t s = 0; // Scan.
  for (size_t i = 0; i < n; i++) {
    pMatIr[i+1] = s+p; // We have p values per row.
    for (size_t j = 0; j < p; j++) {
      pMatJc[s] = rand()%n; // Get column in the range 0 to n-1.
      pMatVal[s] = 1.;
      s++;
    }
  }

  double * pVec = new double[n];
  for (size_t i = 0; i < n; i++) pVec[i] = 1.;

  double * pRes = new double[n];
  for (size_t i = 0; i < n; i++) pRes[i] = 0.;

  auto start = chrono::high_resolution_clock::now();
  for (size_t a = 0; a < AVE; a++) { // Average.
    for (size_t i = 0; i < n; i++) {
      int startJc = pMatIr[i];
      size_t nbJc = pMatIr[i+1] - startJc;
      for (size_t j = 0; j < nbJc; j++) {
        pRes[i] += pMatVal[pMatJc[startJc+j]]*pVec[pMatJc[startJc+j]];
      }
    }
  }
  auto end = chrono::high_resolution_clock::now();
  cout << double(chrono::duration_cast<chrono::milliseconds>(end-start).count())/double(AVE) << " ms" << flush;<

  int rc = 0;
  for (size_t i = 0; i < n; i++)
    if (abs(pRes[i] - AVE*p) > 1e-12) {rc = 1; break;} // Prevent compiler optimizations.
  cout << ((rc == 0) ? ", OK" : ", KO") << endl;

  if (pMatIr)  {delete [] pMatIr;  pMatIr  = NULL;}
  if (pMatJc)  {delete [] pMatJc;  pMatJc  = NULL;}
  if (pMatVal) {delete [] pMatVal; pMatVal = NULL;}
  if (pVec)    {delete [] pVec;    pVec    = NULL;}
  if (pRes)    {delete [] pRes;    pRes    = NULL;}

  return rc;
}


I'm using the 3.3 branch + clang-4.0
netw0rkf10w
Registered Member
Posts
8
Karma
0

Re: Eigen really slow !?...

Mon Nov 27, 2017 10:13 pm
ggael wrote:and finally, if you want more performance for sparse matrix-vector products uses a row-major matrix:

SparseMatrix<double,RowMajor>

for which you can even enable multi-threading if compiling with -fopenmp.

This is great, Gael. Where can we learn tricks and tips like this please? I tried searching on Google but couldn't find a page that mentions your trick :o
User avatar ggael
Moderator
Posts
3428
Karma
19
OS

Re: Eigen really slow !?...

Tue Nov 28, 2017 8:15 am
regarding multithreading, there is a dedicated page: http://eigen.tuxfamily.org/dox/TopicMultiThreading.html
netw0rkf10w
Registered Member
Posts
8
Karma
0

Re: Eigen really slow !?...

Thu Dec 07, 2017 3:25 pm
ggael wrote:regarding multithreading, there is a dedicated page: http://eigen.tuxfamily.org/dox/TopicMultiThreading.html

Thanks a lot, Gael! ;D

 
Reply to topic

Bookmarks



Who is online

Registered users: 1ykos, Baidu [Spider], Bing [Bot], Exabot [Bot], Google [Bot], hstoellinger, jingmx, Majestic-12 [Bot], manliol, Sogou [Bot], Yahoo [Bot]