This forum has been archived. All content is frozen. Please use KDE Discuss instead.

Mac CUDA build failure question

Tags: None
(comma "," separated)
ericklein
Registered Member
Posts
1
Karma
0

Mac CUDA build failure question

Fri Jun 14, 2019 5:54 pm
I'm trying to build Eigen on Mac for CUDA (using the nvcc compiler), and getting build errors. I understand the errors, and I have a change that lets me dodge the build failures, but I suspect it's not the right change for checkin, and so I'm looking for feedback. Please let me know if I should use the mailing list for this purpose instead.

So the issue I have is in Half.h. I wind up getting errors about a bunch of operators being already defined. The core issue is that on Mac, nvcc (the CUDA compliler) is using gcc as the host compiler, but gcc on Mac is built on top of clang. Eigen seems to be implicitly assuming that the presence of clang implies that absence of CUDA (or something along those lines).

In my build I'm hitting this block:

Code: Select all
#if (defined(EIGEN_HAS_CUDA_FP16) && defined(EIGEN_CUDA_ARCH) && \
     EIGEN_CUDA_ARCH >= 530) ||                                  \
    (defined(EIGEN_HAS_HIP_FP16) && defined(HIP_DEVICE_COMPILE))
#define EIGEN_HAS_NATIVE_FP16
#endif


which results in EIGEN_HAS_NATIVE_FP16 being set, and so we wind up compiling in all the operators from Half.h:253-313. That's fine so far.

What happens next is we hit this line:

Code: Select all
#if !defined(EIGEN_HAS_NATIVE_FP16) || EIGEN_COMP_CLANG // Emulate support for half floats


which is followed shortly after by (roughly) the same operator functions, and I get errors because those operator functions were defined above.

So. My hack to work around this is to ensure that EIGEN_COMP_CLANG gets set to 0 in Macros.h if __NVCC__ is defined. That works fine for me locally, and gets Eigen building fine (and thus unblocks me on getting TensorFlow building for Mac, or at least unblocks this issue).

I'm willing to bet however that this is the wrong thing to do in general. I don't understand enough of what this second code block is doing to really understand why clang is being treated differently than nvcc here. I believe there is a version of clang that supports CUDA (at least on some platforms?). Presumably this is for that, but I don't know enough about how that differs from nvcc to fully grok this.

Can anyone help enlighten me about the best way to fix this?

Thanks!


Bookmarks



Who is online

Registered users: Bing [Bot], Evergrowing, Google [Bot], rblackwell