This forum has been archived. All content is frozen. Please use KDE Discuss instead.

extending Eigen with abstractions on top of cuBLAS?

Tags: None
(comma "," separated)
bravegag
Registered Member
Posts
52
Karma
0
Hello,

For a project I'm working on, I'm using Eigen and optionally linking with MKL. However, I have some dense linear algebra computation hotspots that would be fantastic if it was possible running them on the GPU with cuBLAS. So the question: is the idea of extending Eigen abstractions to plug cuBLAS reasonable? I'm mostly using MatrixXd and VectorXd. If this is reasonable can you please point me to examples or illustrate how to do it for one case e.g. cublasSgemm. Otherwise I would simply reinvent the wheel and write my own CuMatrixXd that hides the Host <-> Device memory copying and invoking of the different cuBLAS methods. However, this is not ideal because would prevent having generic functions that would invoke MatrixXd::operator*() without bothering to know whether is CPU or GPU.

Many thanks in advance,
Best regards,
Giovanni
bravegag
Registered Member
Posts
52
Karma
0
This is somewhat related to what I'm looking for: http://code.google.com/p/gpumatrix/ it complies to Eigen base abstractions. I would like to have a fork of MatrixXd with cuBLAS backing up most of the operators and factorizations available. Any hints/pointers/documentation on how to do that?
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
There are two strategies. The first one is the simplest and would be to add a backend as the MKL one. The disadvantage would be host <-> device transfer for every operation. The second one would be to add a kind of DeviceMatrix class, as you said, but this is much more tricky to get it right, especially if you want to support all expressions, and thus generate kernels on demand for coefficient-wise operations. Anyway, to this end I would start by adding a new StorageKind called DenseDevice, make DeviceMatrix use it, and in the products and factorization backends, fallback to cublas/magma according to the StorageKind. To generate kernels on demand, there is a start there: https://bitbucket.org/ggael/eigen-nvcc that already allows to call Eigen from CUDA. The next step would be to specialize the assignment logic for DeviceMatrix to call a CUDA kernel templated by the expression types.
bravegag
Registered Member
Posts
52
Karma
0
Hi ggael!

This was really useful information again :) Do you know whether there is some work I could reuse (or even help with) in this area? Given my little knowledge on the Eigen framework as developer I would tend to take the first path but if we could build some synergies for having support for the second no-compromise solution path I would actually prefer that.

Thanks in advance,
Best regards,
Giovanni
bravegag
Registered Member
Posts
52
Karma
0
I was just reviewing the Eigen sources and how it uses MKL so I can "extrapolate" it to cuBLAS e.g. ColPivHouseholderQR.h implementation and the corresponding ColPivHouseholderQR_MKL.h version. As far as I understand, the later is used when EIGEN_USE_BLAS is defined. However, I do not understand the mechanics behind overwriting the original ColPivHouseholderQR<MatrixType>::compute(...) method implementation with the MKL one defined in ColPivHouseholderQR_MKL.h. Can anyone please clarify?

TIA,
Best regards,
Giovanni
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
Regarding ColPivHouseholderQR_MKL.h, the idea is to perform a full template specialization of ColPivHouseholderQR::compute for types that are supported by Lapack. Since all specializations are basically the same, the approach we took consists in putting one generic specialization into a macro, and a few macro calls allow us to instantiate all specializations. The specialization itself is essentially a call to respective lapack routine with the necessary allocations of the internal structure and update of the permutation to match the 0-based indexing mechanism of C++ versus the 1-based indexing of Lapack. All wrappers to BLAS/Lapack works the same way.
bravegag
Registered Member
Posts
52
Karma
0
Thank you! now i see it, you specialize the template parameter Matrix with e.g. dynamic dimensions and LAPACK matrix ordering etc. I was confused before because I didn't see the same number of template parameters in Col* and Col*_MKL the multiple parameters corresponded to the macro and not to the compute method.

So if I wanted to extend all these factorizations for the GPU I would need to mimic the same that was done for MKL i.e. add the corresponding *_GPU.h variations. However, it looks like you don't delegate to MKL basic matrix computations covered by Expression Templates like MMM or MVM, correct? here I would be a bit more lost on how to delegate to GPU. and also here is where it gets interesting e.g. MMM is where GPU would give some serious beating to MKL running on a card like 680 GTX that has well over 3.0Tera-flop/s of double precision peak performance. Actually one naive way to do this would be to extend MatrixXd with implementation say e.g. GpuMatrixXd and overload the corresponding operators *, +, - etc to be done on the GPU but this solution will require temporary instances and expensive copying asuming a non C++11 implementation which anyway doesn't mix up well with nvcc that only supports gnu g++ versions below 4.7
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
What do you mean by MMM and MVM ?? Regarding coefficient-wise operations MKL cannot beat an expression-template approach, and that's even more important for GPUs. but as I said that part is much more tricky to implement and honestly I don't know how to explain how to start. actually, we need to finish some refactoring of our expression template mechanism to make its generalization to GPU much easier.
bravegag
Registered Member
Posts
52
Karma
0
MMM: matrix matrix multiplication
MVM: matrix vector multiplication

Is there a way I can help? btw the Magma project http://icl.cs.utk.edu/magma/ looks very interesting as well as an alternative to wrapping cuBLAS.
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
ah, for products, we also have backends to MKL. see in /src/Core/products/*. Adapting them to cuBlas should be easy. Btw, cublas is only for matrix products (and similar). For dense decompositions magma is the right choice. Adding wrapper to them would already help a lot and cover many use cases.
bravegag
Registered Member
Posts
52
Karma
0
Hi ggael,

Thank you for the replies, indeed it looks interesting I will try to do the magma integration. Do you have a repo I could work on? btw I think the same template specialization strategy used for MKL would work for Magma too right? "Matrix<double, Dynamic, Dynamic, ColMajor, Dynamic, Dynamic>".

Best regards,
Giovanni
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
Actually, if MAGMA offers a Lapack compatible API, then it should be enough to generalize the current MKL support to avoid the use of MKL specific types and the likes. No need to copy/paste everything!!

For cuBlas this is a bit different because the API is not compatible. Two options: 1 - write a true BLAS interface to cuBLAS and no change in Eigen (maybe this already exists). 2 - Copy-paste and adjust the BLAS backends inside Eigen.
User avatar
jdh8
Registered Member
Posts
12
Karma
0
OS
Somebody is adapting MKL's Eigen binding to MAGMA and cuBLAS. :)
https://github.com/bravegag/eigen-magma


Bookmarks



Who is online

Registered users: Bing [Bot], Google [Bot], Yahoo [Bot]