Eigen really slow !?... • KDE Community Forums

This forum has been archived. All content is frozen. Please use KDE Discuss instead.

Board index

Eigen really slow !?...

Page 1 of 1 (8 posts)

Tags:

fhoussen Registered Member Posts 2 Karma 0	Eigen really slow !?... Sun Oct 01, 2017 2:48 pm I tried to run a very simple (sparse) matrix vector product (V0) on my laptop but it's very very slow ?... A naive implementation (V1) is very very much faster : >> ./matVecProd.sh g++ -I /home/fghoussen/Documents/INRIA/eigen-eigen-5a0156e40feb/local/include/eigen3 -mavx -march=native -O3 -o matVecProdV0.exe matVecProdV0.cpp g++ -march=native -O3 -ftree-vectorize -fopt-info-optall -funroll-loops -ffast-math -fstrict-aliasing -o matVecProdV1.exe matVecProdV1.cpp matVecProdV1.cpp:43:28: note: loop unrolled 7 times matVecProdV0 : movpd 0, addpd 2, mulpd 0, time 136911 ms, KO matVecProdV1 : movpd 0, addpd 0, mulpd 0, time 495 ms, KO This appends both when I compile with -mavx or -msse2 Can somebody help me to understand why ? My laptop has 4 procs (2 cores + hyperthreading). >> cat /proc/cpuinfo model name : Intel(R) Core(TM) i7-3687U CPU @ 2.10GHz flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm cpuid_fault epb tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts Franck Files are available here : https://filesender.renater.fr/?s=downlo ... dca165dd22
ggael Moderator Posts 3447 Karma 19 OS	Re: Eigen really slow !?... Mon Oct 02, 2017 8:15 am but you code is plain wrong and does not compute anything ! The problem is that pMatIr[i]==p for all i>0 and thus pMatIr[i+1]-pMatIr[i]==0 and the line pRes[i] += ...; is never called. Then this += line is also wrong, it should be: pRes[i] += pMatVal[pMatJc[startJc+j]]pVec[pMatJc[startJc+j]]; and not: pRes[i] += pMatVal[pMatJc[startJc+j]]pVec[i]; finally, the checks are wrong too, you should replace n by p: int rc = 0; for (size_t i = 0; i < n; i++) if (abs(pRes[i] - AVE*p) > 1e-12) {rc = 1; break;} // Prevent compiler optimizations. cout << ((rc == 0) ? ", OK" : ", KO") << endl; and finally, if you want more performance for sparse matrix-vector products uses a row-major matrix: SparseMatrix<double,RowMajor> for which you can even enable multi-threading if compiling with -fopenmp. And generally, it is not a good idea to bench code written within the main function, better wrap it within a non-inline function to be closer to real-world usage.
fhoussen Registered Member Posts 2 Karma 0	Re: Eigen really slow !?... Mon Oct 02, 2017 11:07 am Just saw your reply here... Thanks, you are right. Even with the fix, I still the same kind of problem. >>matVecProdV0 : movpd 0, addpd 31, mulpd 0, time 136682 ms, OK >>matVecProdV1 : movpd 0, addpd 0, mulpd 0, time 64635 ms, OK Franck
ggael Moderator Posts 3447 Karma 19 OS	Re: Eigen really slow !?... Mon Oct 02, 2017 4:04 pm here I get same speed if just fixing the code, then Eigen's about x1.8 faster if using SparseMatrix<double,RowMajor>, and finally x5 faster if using RowMajor + "-fopenmp".
ggael Moderator Posts 3447 Karma 19 OS	Re: Eigen really slow !?... Mon Oct 02, 2017 4:06 pm V0: Code: Select all #include <iostream> #include <string> #include <sstream> #include <chrono> #include <cmath> #include <cstdlib> // rand. #include <Eigen/Sparse> #include <vector> using namespace std; using namespace Eigen; #define AVE 7500 EIGEN_DONT_INLINE int main(int argc, char ** argv) { if (argc != 3 \|\| !argv) return 1; size_t n = 0; stringstream sn(argv[1]); sn >> n; if (n <= 0) return 1; size_t p = 0; stringstream sp(argv[2]); sp >> p; if (p <= 0 \|\| p > n) return 1; vector<Eigen::Triplet<double>> ijAij; ijAij.reserve(np); // We have p values per row. for (size_t i = 0; i < n; i++) { for (size_t j = 0; j < p; j++) { // We have p values per row. ijAij.push_back(Eigen::Triplet<double> (i, rand()%n, 1.)); // Get column in the range 0 to n-1. } } Eigen::SparseMatrix<double,RowMajor> mat(n, n); mat.reserve(np); // We have p values per row. mat.setFromTriplets(ijAij.begin(), ijAij.end()); Eigen::VectorXd vec(n); for (size_t i = 0; i < n; i++) vec(i) = 1.; Eigen::VectorXd res(n); for (size_t i = 0; i < n; i++) res(i) = 0.; auto start = chrono::high_resolution_clock::now(); for (size_t a = 0; a < AVE; a++) { // Average. res += matvec; } auto end = chrono::high_resolution_clock::now(); cout << double(chrono::duration_cast<chrono::milliseconds>(end-start).count())/double(AVE) << " ms" << flush; int rc = 0; for (size_t i = 0; i < n; i++) if (abs(res(i) - AVEp) > 1e-12) {rc = 1; break;} // Prevent compiler optimizations. cout << ((rc == 0) ? ", OK" : ", KO") << "\n"; return rc; } V1: Code: Select all #include <iostream> #include <string> #include <sstream> #include <chrono> #include <cmath> #include <cstdlib> // rand. using namespace std; #define AVE 7500 int main(int argc, char ** argv) { if (argc != 3 \|\| !argv) return 1; size_t n = 0; stringstream sn(argv[1]); sn >> n; if (n <= 0) return 1; size_t p = 0; stringstream sp(argv[2]); sp >> p; if (p <= 0 \|\| p > n) return 1; int * pMatIr = new int[n+1]; pMatIr[0] = 0; int nnz = np; // Number of non null values: p values per row n rows. int * pMatJc = new int[nnz]; double * pMatVal = new double[nnz]; size_t s = 0; // Scan. for (size_t i = 0; i < n; i++) { pMatIr[i+1] = s+p; // We have p values per row. for (size_t j = 0; j < p; j++) { pMatJc[s] = rand()%n; // Get column in the range 0 to n-1. pMatVal[s] = 1.; s++; } } double * pVec = new double[n]; for (size_t i = 0; i < n; i++) pVec[i] = 1.; double * pRes = new double[n]; for (size_t i = 0; i < n; i++) pRes[i] = 0.; auto start = chrono::high_resolution_clock::now(); for (size_t a = 0; a < AVE; a++) { // Average. for (size_t i = 0; i < n; i++) { int startJc = pMatIr[i]; size_t nbJc = pMatIr[i+1] - startJc; for (size_t j = 0; j < nbJc; j++) { pRes[i] += pMatVal[pMatJc[startJc+j]]pVec[pMatJc[startJc+j]]; } } } auto end = chrono::high_resolution_clock::now(); cout << double(chrono::duration_cast<chrono::milliseconds>(end-start).count())/double(AVE) << " ms" << flush;< int rc = 0; for (size_t i = 0; i < n; i++) if (abs(pRes[i] - AVEp) > 1e-12) {rc = 1; break;} // Prevent compiler optimizations. cout << ((rc == 0) ? ", OK" : ", KO") << endl; if (pMatIr) {delete [] pMatIr; pMatIr = NULL;} if (pMatJc) {delete [] pMatJc; pMatJc = NULL;} if (pMatVal) {delete [] pMatVal; pMatVal = NULL;} if (pVec) {delete [] pVec; pVec = NULL;} if (pRes) {delete [] pRes; pRes = NULL;} return rc; } I'm using the 3.3 branch + clang-4.0
netw0rkf10w Registered Member Posts 8 Karma 0	Re: Eigen really slow !?... Mon Nov 27, 2017 10:13 pm ggael wrote:and finally, if you want more performance for sparse matrix-vector products uses a row-major matrix: SparseMatrix<double,RowMajor> for which you can even enable multi-threading if compiling with -fopenmp. This is great, Gael. Where can we learn tricks and tips like this please? I tried searching on Google but couldn't find a page that mentions your trick
ggael Moderator Posts 3447 Karma 19 OS	Re: Eigen really slow !?... Tue Nov 28, 2017 8:15 am regarding multithreading, there is a dedicated page: http://eigen.tuxfamily.org/dox/TopicMultiThreading.html
netw0rkf10w Registered Member Posts 8 Karma 0	Re: Eigen really slow !?... Thu Dec 07, 2017 3:25 pm ggael wrote:regarding multithreading, there is a dedicated page: http://eigen.tuxfamily.org/dox/TopicMultiThreading.html Thanks a lot, Gael!

Page 1 of 1 (8 posts)

Bookmarks

Who is online

Registered users: Bing [Bot], Google [Bot], Sogou [Bot]