Registered Member
|
Hi,
I'm working on a large project that uses Eigen and has many CPU-algorithms where MatrixXd and VectorXd are ubiquitous in the public interfaces. Now I'd like to add complete GPU variants (not just invoking the MAGMA backend with a lossing memory copy Host from/to Device strategy) for all the algorithms without having to change my algorithm interfaces everywhere i.e. I'd like to have a DeviceMatrixXd and DeviceVectorXd that can be interchangeably used wherever MatrixXd and VectorXd can. To this end I'm tempted to define my own but after looking into https://bitbucket.org/ggael/eigen-nvcc I was wondering whether a DeviceMatrix and DeviceVector already exist anywhere? TIA, Best regards, Giovanni |
Moderator
|
No DeviceMatrix yet. This eigen-nvcc branch has now been merged within the main Eigen devel branch. Currently it only allows to call Eigen from CUDA. However this is a necessary step toward the evaluation of Eigen's expressions on CUDA. This second step will become straightforward once the refactoring of our expression template engine will be finished (I made good progress recently).
|
Registered Member
|
Hi ggael,
Thank you for your prompt answer and always excellent support In your answer somehow I decode that what you did there could help me because to be able to use Eigen namely MatrixXd from a cuda kernel then the MatrixXd memory has to be device memory or? I actually looked into this Eigen-nvcc branch but could not find any cuda memory allocation within Memory.h which would be the place to look for any form of allocation correct? I have a lot of presure and interest on this branch, please let me know whether I can help in any way. Best regards, Giovanni |
Moderator
|
The current port is mostly to allow to declare fixed-size small matrices and vectors (or similar Map objects) within CUDA kernel, so memory management is not our business yet. This is related to your goal because the evaluation of, for instance, A = 2 * B + C boils down to something like:
This is very simplified, but that's the idea. A real world version of this for loop is there: https://bitbucket.org/ggael/eigen-evalu ... ult#cl-264 and the creation of the evaluator: https://bitbucket.org/ggael/eigen-evalu ... ult#cl-616 Now if A would be a DeviceMatrix, we would simply have to do something like (again this is over-simplified, just to get the idea):
This mean nvcc must be able to able to properly compile our expression code, that what the current port is doing. If you wanna help, a first step would be to have a DeviceMatrix class with device storage and adapt your support for magma for this class... BTW, have you seen that in the future CUDA6, it will possible to allocate "managed" buffer that are automatically synchronised between the GPU and host memory. I don't know the memory overhead yet, but if it's not too high, that might simplify a lot our job!! |
Registered Member
|
I am a bit confused by this discussion so not sure if it answers my question so I will rephrase.
I would like to be able to use the fuctions like colwise() and rowise() but witih matricies on the GPU. Will Eigen have support for this soon? Thanks in advance. |
Registered users: Baidu [Spider], Bing [Bot], Google [Bot]