This forum has been archived. All content is frozen. Please use KDE Discuss instead.

Eigen backend MKL & MAGMA design question

Tags: None
(comma "," separated)
bravegag
Registered Member
Posts
52
Karma
0
Hello,

I have been working on the MAGMA/CUBLAS backend https://github.com/bravegag/eigen-magma and some factorizations and products really payoff e.g. DGEMM, DGEQP3, DPORTF, etc whereas other operations don't perform well e.g. DGEMV, DTRSM. At the moment in my project I use the MAGMA backend and I benefit a lot from the performing operations but as soon as I need e.g. EigenSolver DGEES which is not yet available in MAGMA, and it performs a lot better in MKL than the native Eigen implementation I can't use MKL because it would hardwire the MAGMA backend to MKL.

However, in our specific project we have MKL and makes sense to combine the best of both backends MAGMA and MKL but I don't know how to do this without losing generality. For example, I could replace all the incomplete/nonperforming MAGMA/CUBLAS operations with those of MKL but this will not be really a good design choice for people who don't want MKL and choose a different MAGMA flavor e.g. MAGMA-ATLAS with gnu compilers. Another possibility is to maintain a separate branch where I choose the fastest between MAGMA-CUDA-MKL and MKL but this would be a maintenance toll hmm.

Can anyone drop any thoughts on this?

TIA,
Best regards,
Giovanni
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
You could simply offer a EIGEN_USE_MAGMA_AND_MKL token that, in addition to use MAGMA, would fallback to MKL. EIGEN_USE_MAGMA_AND_MKL would implies EIGEN_USE_MAGMA and includes the few *_MKL.h files that do not have MAGMA equivalent.
bravegag
Registered Member
Posts
52
Karma
0
hi ggael,

I have implemented what you suggested introducing the new token EIGEN_USE_MAGMA_AND_MKL. Both projects are updated now.

Suggestions and bugfixes are always welcome :)

Cheers,
Giovanni
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
Nice work.

At some point we should try to find a way to merge it upstream. My initial plan for GPU support was to introduce a new matrix type, e.g., DeviceMatrix/DeviceVector which would store its data in device memory. Custom kernels would be automatically generated for coefficient-wise operations, while products and solvers would be performed using cublas/magma (this is were your work will become handy). This approach should be more efficient (less copy), and more flexible as it allows to mix CPU and GPU computations.

However, this requires much more work and might takes time to appear and your current approach would already be extremely useful so why not having both but I'am a bit worried about offering official support for it. The MKL layer is already getting us many troubles that we are not able to reproduce and fix.

To this end it would be nice to have all the code lies in an unsupported/Eigen/MagmaSupport module. As far as I known, the EIGEN_USE_MAGMA mostly adds template specializations. Therefore #including <unsupported/Eigen/MagmaSupport> should be enough to activate the use of MAGMA. Tell me if I'm wrong, but the only problem would be to enable pinned memory allocation as it require modifications in Eigen/Core. We could enforce Eigen/MagmaSupport to be included before any other Eigen header with a #error directive in Eigen/MagmaSupport. Perhaps an even better approach would be to make pinned memory allocation optional. Indeed, in a single project one might want to enable MAGMA for a given .cpp file only, and keep using standard Eigen elsewhere. In this scenario it might likely be the case that matrices are exchanged between MAGAM-enabled and non-MAGMA code, and if some used pinned allocation while the other don't, I guess this might produce some problem (at least if one is allocated in one translation unit but deleted in another). So making pinned allocation optional would fix this issue too.

What do you think?
bravegag
Registered Member
Posts
52
Karma
0
Hi ggael!

ggael wrote:Nice work.

Thank you 8-)

ggael wrote:At some point we should try to find a way to merge it upstream. My initial plan for GPU support was to introduce a new matrix type, e.g., DeviceMatrix/DeviceVector which would store its data in device memory. Custom kernels would be automatically generated for coefficient-wise operations, while products and solvers would be performed using cublas/magma (this is were your work will become handy). This approach should be more efficient (less copy), and more flexible as it allows to mix CPU and GPU computations.

However, this requires much more work and might takes time to appear and your current approach would already be extremely useful so why not having both but I'am a bit worried about offering official support for it. The MKL layer is already getting us many troubles that we are not able to reproduce and fix.

I agree with merging upstream, just tell me what I need to do. I'm less familiar with the approach you propose "DeviceMatrix/DeviceVector" but I will contribute as much as possible. Indeed the current approach is sort of a losing strategy because of the copying while some expressions may definitely take advantage of matrix/vectors that have been previously copied to the Device.

ggael wrote:To this end it would be nice to have all the code lies in an unsupported/Eigen/MagmaSupport module. As far as I known, the EIGEN_USE_MAGMA mostly adds template specializations. Therefore #including <unsupported/Eigen/MagmaSupport> should be enough to activate the use of MAGMA. Tell me if I'm wrong, but the only problem would be to enable pinned memory allocation as it require modifications in Eigen/Core. We could enforce Eigen/MagmaSupport to be included before any other Eigen header with a #error directive in Eigen/MagmaSupport. Perhaps an even better approach would be to make pinned memory allocation optional. Indeed, in a single project one might want to enable MAGMA for a given .cpp file only, and keep using standard Eigen elsewhere. In this scenario it might likely be the case that matrices are exchanged between MAGAM-enabled and non-MAGMA code, and if some used pinned allocation while the other don't, I guess this might produce some problem (at least if one is allocated in one translation unit but deleted in another). So making pinned allocation optional would fix this issue too.
What do you think?

I think it would be straightforward to do what you propose but AFAIK there is nothing fancy about pinned memory other than allocating it non-pageable, therefore it can be used for CPU-CPU computation np.

Best regards,
Giovanni


Bookmarks



Who is online

Registered users: Bing [Bot], Google [Bot], Sogou [Bot], Yahoo [Bot]