This forum has been archived. All content is frozen. Please use KDE Discuss instead.

Eigen with Accelerate

Tags: None
(comma "," separated)
senseiwa
Registered Member
Posts
3
Karma
0

Eigen with Accelerate

Fri Jun 03, 2016 1:27 pm
Dear all,

I am unable to use OpenMP on MacOS X (for valid reasons), so I am trying to understand if I can use the multithreaded routines in the Accelerate framework (BLAS/LAPACK) with Eigen classes.

Thanks for any suggestions!
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS

Re: Eigen with Accelerate

Fri Jun 03, 2016 3:09 pm
with the devel branch you can use whatever BLAS: -DEIGEN_USE_BLAS -framework Accelerate

For LAPACK, the interface is currently partly limited to MKL, but generalizing it to any LAPACK should be straightforward,basically it consists in reproducing the changes that has been required for BLAS: https://bitbucket.org/eigen/eigen/commits/daf3e6a2b870 to LAPACK interface files:

Eigen/src/Cholesky/LLT_MKL.h
Eigen/src/Eigenvalues/RealSchur_MKL.h
Eigen/src/Eigenvalues/SelfAdjointEigenSolver_MKL.h
Eigen/src/Eigenvalues/ComplexSchur_MKL.h
Eigen/src/QR/ColPivHouseholderQR_MKL.h
Eigen/src/QR/HouseholderQR_MKL.h
Eigen/src/LU/PartialPivLU_MKL.h
Eigen/src/SVD/JacobiSVD_MKL.h

If you're willing to give some help, I guess I can try to update Eigen/src/Cholesky/LLT_MKL.h, and then let you reproduce the changes...
senseiwa
Registered Member
Posts
3
Karma
0

Re: Eigen with Accelerate

Mon Jun 06, 2016 8:07 am
That would be awesome, if you let me know I will make that work, if I can!
zhanxw
Registered Member
Posts
17
Karma
0

Re: Eigen with Accelerate

Wed Jun 29, 2016 10:30 pm
I am also interested in this.
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS

Re: Eigen with Accelerate

Fri Jul 01, 2016 7:02 am
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS

Re: Eigen with Accelerate

Mon Jul 25, 2016 4:39 pm
Fully DONE in 3.3.
m7thon
Registered Member
Posts
2
Karma
0

Re: Eigen with Accelerate

Wed Jul 27, 2016 12:22 pm
ggael wrote:Fully DONE in 3.3.


... but it doesn't work. It seems Eigen now requires the LAPACKE C interface, but the Accelreate framework only provides the CLAPACK C interface.

So Eigen does *not* (and is not planned to?) work with Apple's Accelerate framework. Or do I misunderstand something?
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS

Re: Eigen with Accelerate

Wed Jul 27, 2016 12:56 pm
$ sudo port install lapack

and then link to /opt/local/lib/lapack/liblapacke.dylib

This is just a thin layer, and accelerate will be used under the hood, see:

Code: Select all
$ otool -L test/lu_1
test/lu_1:
   /System/Library/Frameworks/Accelerate.framework/Versions/A/Accelerate (compatibility version 1.0.0, current version 4.0.0)
   /opt/local/lib/lapack/liblapacke.dylib (compatibility version 0.0.0, current version 0.0.0)
   /usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 120.1.0)
   /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1226.10.1)
m7thon
Registered Member
Posts
2
Karma
0

Re: Eigen with Accelerate

Wed Jul 27, 2016 1:54 pm
Thanks for the hint. However, on my system this seems not to be a thin wrapper around the Accelerate framework at all. But, I also compiled `liblapacke.a` from the netlib sources directly (into ~/local/lib/lapack/liblapacke.a), and this does seem to work as desired when linking with the Accelerate framework.

Here is what I get for the Cholesky benchmark in the various versions:

1. without external BLAS/LAPACK:
Code: Select all
$ g++-mp-6 -DNDEBUG -O3 -march=native -Wa,-q -I.. benchCholesky.cpp -o benchCholesky && ./benchCholesky
[ some warnings ]
size            no sqrt                           standard
dyn   4    0.00495987s (64.5179 MFLOPS)   0.00201893s (158.5 MFLOPS)
dyn   6    0.00838331s (119.285 MFLOPS)   0.00295007s (338.975 MFLOPS)
dyn   8    0.00928132s (241.345 MFLOPS)   0.00567397s (394.785 MFLOPS)
dyn   16    0.0193549s (826.665 MFLOPS)   0.012304s (1300.39 MFLOPS)
dyn   24    0.0362s (1423.2 MFLOPS)   0.0236766s (2175.99 MFLOPS)
dyn   32    0.061351s (1940.31 MFLOPS)   0.048599s (2449.43 MFLOPS)
dyn   49    0.159833s (2599.71 MFLOPS)   0.126659s (3280.61 MFLOPS)
dyn   64    0.24552s (3722.38 MFLOPS)   0.169819s (5381.73 MFLOPS)
dyn   128    1.34955s (5300.01 MFLOPS)   0.627375s (11400.9 MFLOPS)
dyn   256    7.50869s (7534.74 MFLOPS)   2.82067s (20057.7 MFLOPS)
dyn   512    59.7866s (7526.88 MFLOPS)   14.9595s (30081.7 MFLOPS)
dyn   900    245.678s (9923.92 MFLOPS)   72.6149s (33575.6 MFLOPS)
fixed 2    0.000131581s (303.996 MFLOPS)   0.000175881s (227.426 MFLOPS)
fixed 3    0.000610919s (229.163 MFLOPS)   0.00046569s (300.629 MFLOPS)
fixed 4    0.000987738s (323.973 MFLOPS)   0.000733155s (436.47 MFLOPS)
fixed 5    0.00171065s (350.744 MFLOPS)   0.000983146s (610.286 MFLOPS)
fixed 6    0.00218897s (456.836 MFLOPS)   0.00133507s (749.023 MFLOPS)
fixed 7    0.00277117s (555.722 MFLOPS)   0.00174151s (884.288 MFLOPS)
fixed 8    0.00437262s (512.279 MFLOPS)   0.00233162s (960.707 MFLOPS)
fixed 12    0.00913738s (770.462 MFLOPS)   0.00466138s (1510.28 MFLOPS)
fixed 16    0.0138131s (1158.32 MFLOPS)   0.00761411s (2101.36 MFLOPS)


2. with liblapacke.a compiled from netlib sources and -framework Accelerate:
Code: Select all
$ g++-mp-6 -DNDEBUG -DEIGEN_USE_BLAS -DEIGEN_USE_LAPACKE -O3 -march=native -Wa,-q -I.. benchCholesky.cpp -L /Users/mthon/local/lib/lapack -llapacke -framework Accelerate -o benchCholesky && ./benchCholesky
[ some warnings ]
size            no sqrt                           standard
dyn   4    0.00503952s (63.4981 MFLOPS)   0.0037625s (85.0498 MFLOPS)
dyn   6    0.0082195s (121.662 MFLOPS)   0.00527121s (189.71 MFLOPS)
dyn   8    0.0103794s (215.812 MFLOPS)   0.00779373s (287.41 MFLOPS)
dyn   16    0.0223499s (715.888 MFLOPS)   0.0153277s (1043.86 MFLOPS)
dyn   24    0.0404056s (1275.07 MFLOPS)   0.0285939s (1801.78 MFLOPS)
dyn   32    0.0691321s (1721.92 MFLOPS)   0.0481106s (2474.3 MFLOPS)
dyn   49    0.167135s (2486.14 MFLOPS)   0.118004s (3521.25 MFLOPS)
dyn   64    0.272052s (3359.35 MFLOPS)   0.241463s (3784.93 MFLOPS)
dyn   128    1.59725s (4478.09 MFLOPS)   1.5585s (4589.44 MFLOPS)
dyn   256    7.75185s (7298.39 MFLOPS)   3.5215s (16065.9 MFLOPS)
dyn   512    81.7705s (5503.29 MFLOPS)   14.0428s (32045.4 MFLOPS)
dyn   900    311.771s (7820.11 MFLOPS)   56.0695s (43483.3 MFLOPS)
fixed 2    0.000136169s (293.754 MFLOPS)   0.00141312s (28.3062 MFLOPS)
fixed 3    0.000618463s (226.368 MFLOPS)   0.0018582s (75.3417 MFLOPS)
fixed 4    0.00117728s (271.812 MFLOPS)   0.00242466s (131.977 MFLOPS)
fixed 5    0.00178205s (336.69 MFLOPS)   0.00295339s (203.156 MFLOPS)
fixed 6    0.00228675s (437.301 MFLOPS)   0.00424085s (235.802 MFLOPS)
fixed 7    0.0027845s (553.062 MFLOPS)   0.00489408s (314.666 MFLOPS)
fixed 8    0.00432398s (518.042 MFLOPS)   0.00577575s (387.828 MFLOPS)
fixed 12    0.0092797s (758.645 MFLOPS)   0.00925737s (760.475 MFLOPS)
fixed 16    0.0141062s (1134.25 MFLOPS)   0.0144665s (1106 MFLOPS)


3. with liblapacke.dylib from the the lapack port and -framework Accelerate:
Code: Select all
$ g++-mp-6 -DNDEBUG -DEIGEN_USE_BLAS -DEIGEN_USE_LAPACKE -O3 -march=native -Wa,-q -I.. benchCholesky.cpp -L /opt/local/lib/lapack -llapacke -framework Accelerate -o benchCholesky && ./benchCholesky
[ some warnings ]
size            no sqrt                           standard
dyn   4    0.00454109s (70.4676 MFLOPS)   0.00484882s (65.9954 MFLOPS)
dyn   6    0.00823956s (121.366 MFLOPS)   0.00723859s (138.148 MFLOPS)
dyn   8    0.00999928s (224.016 MFLOPS)   0.0125611s (178.329 MFLOPS)
dyn   16    0.0222493s (719.124 MFLOPS)   0.0297482s (537.847 MFLOPS)
dyn   24    0.0397032s (1297.63 MFLOPS)   0.0624922s (824.423 MFLOPS)
dyn   32    0.068942s (1726.67 MFLOPS)   0.113872s (1045.39 MFLOPS)
dyn   49    0.167828s (2475.86 MFLOPS)   0.310364s (1338.82 MFLOPS)
dyn   64    0.272806s (3350.07 MFLOPS)   0.614699s (1486.78 MFLOPS)
dyn   128    1.46134s (4894.58 MFLOPS)   4.02734s (1776.02 MFLOPS)
dyn   256    8.2123s (6889.18 MFLOPS)   24.6018s (2299.67 MFLOPS)
dyn   512    82.7469s (5438.36 MFLOPS)   212.843s (2114.27 MFLOPS)
dyn   900    337.102s (7232.49 MFLOPS)   918.733s (2653.75 MFLOPS)
fixed 2    0.000171282s (233.533 MFLOPS)   0.00168254s (23.7736 MFLOPS)
fixed 3    0.000614276s (227.911 MFLOPS)   0.00293917s (47.6325 MFLOPS)
fixed 4    0.00128422s (249.179 MFLOPS)   0.00402186s (79.5652 MFLOPS)
fixed 5    0.00172469s (347.888 MFLOPS)   0.0048868s (122.78 MFLOPS)
fixed 6    0.00255105s (391.995 MFLOPS)   0.00632579s (158.083 MFLOPS)
fixed 7    0.00277838s (554.279 MFLOPS)   0.00742864s (207.306 MFLOPS)
fixed 8    0.00428862s (522.313 MFLOPS)   0.00999043s (224.215 MFLOPS)
fixed 12    0.00905995s (777.047 MFLOPS)   0.0176852s (398.073 MFLOPS)
fixed 16    0.0142315s (1124.27 MFLOPS)   0.0289513s (552.653 MFLOPS)


Bookmarks



Who is online

Registered users: Bing [Bot], blue_bullet, Google [Bot], rockscient, Yahoo [Bot]