Registered Member
|
Hi
I'm trying to use the new MKL support with Eigen (EIGEN_USE_MKL_ALL), but when I do so I found several error when compiling - std::complex is used without #include <complex> - eigen\src/Core/Assign_MKL.h(210): error C3861: 'vmsXXX_': identifier not found. I commented out these lines and it compiled ok. - When I use the HouseholderQR to solve linear least square problems, the compiler complained about ambiguous call to overloaded function Eigen::internal::householder_qr_inplace_blocked. I fixed this by adding #ifndef EIGEN_USE_MKL_VML before the defintion of the function void householder_qr_inplace_blocked(...) in the file HouseholderQR.h and #endif right after the definition. Now everything works fine. - Result discrepancy with Eigen 3.0.5: I want to solve the least square equation A*Y=B. Now the data of the matrices are in the links below. The file Y.csv contains the solution matrix Y that has been solved using a Qr Householder with Column pivoting algorithm by a Java library called ejml. The root mean square error (rmse) of the solution is 0.00027125. The execution time in Java is ~400ms. http://www.mediafire.com/?lbp1nk9qja9nqzz http://www.mediafire.com/?zz4b5jqvc18i7ar http://www.mediafire.com/?1b4gw78qwmj2xu7 Now I have been trying various C++ library to solve that same problem. Here are some of the results - Gotoblas2 dgelsd: rmse = 0.00422206 in 185 ms - Gotoblas dgels,dgelsy : failed - Eigen 3.1 householderQR with MKL: rmse = 0.00027125 in 163 ms - Eigen 3.1 ColPivHouseholderQR with MKL: rmse = 0.00422206 in 364 ms - Eigen 3.1 FullPivHouseholderQR with MKL:failed in 409 ms - Eigen 3.1 JacobiSVD with MKL : 1.40649e+13 (wtf) in 302 ms - Eigen 3.0.5 FullPivHouseholderQR(no MKL): failed in 431 ms - Eigen 3.0.5 ColPivHouseholderQR (no MKL): 0.00436808 in 331 ms - Eigen 3.0.5 householderQR (no MKL): failed in 4158 ms - Eigen 3.0.5 JacobiSVD (no MKL): 1.34629e+24 (wtf) in 1968 ms As you can see, the results between two versions of Eigen are so different. The best result is achieved by householderQR with MKL which is the same as the one implemented by pure Java.The good things is all the 3.1 results are better and taken much less time. Can s1 explain about the discrepancy of the result. I am using msvc 2010, Intel MKL 10.3. |
Moderator
|
The problem is due to overflow. Actually your "reference" solution contains some infinite values which have been clamped to the highest representable double 1.79769e+308 which seems to do the trick in your particular exemple. With appropriate scaling:
A *= 1e154; I manage to get a rather low error with Eigen's HouseholderQR: (A*X-B)^2 = 3.51518e-05 within 9 ms. (same error than with the Y solution you provided). The other solver are still failing because of overflow. I guess your example is a good one to experiment proper handling strategies of overflows. cheers, Gael. |
Registered Member
|
Thank you very much Gael. I have tried your trick in my machine and confirm that Eigen 3.0.5 now have the same result as 3.1 in just 50ms. Using the same trick with MKL found the solution in just 21ms. So it does indeed improve both the accuracy as well as efficiency in this case. The thing is how can FullPivHouseholderQR and ColPivHouseholderQR perform much worse (even with the trick) than HouseholderQR . Are they supposed to be more reliable in all cases?. The result of ColPivHouseholderQR with MKL is the same as dgelsd of LAPACK and both are worse than the result of the LinearSolverQrHouseCol (a Java version ColPivHouseholderQR) written in Java. In the Java version, the Moore-Penrose pseudo-inverse using SVD solution give rmse = 0.004222056 in 1218 ms. Using the trick in Java improve the run of LinearSolverQrHouseCol to 99ms and SolvePseudoInverse to 700ms.
Are you using msvc or gcc when you get 9ms? |
Registered Member
|
Hi Gael
Applying your trick I am able to solve the problem using dgels and dgelsy functions of LAPACK. Below are the results of my run using LAPACK+GotoBlas2 - Dgels : rmse = 0.00027175 in ~11ms - Eigen householder: rmse = 0.00027175 in ~10-12ms - Dgelsy: rmse = 0.00027175 in 11-13ms All of the results are better than dgelsd which was claimed to be the most reliable least square solver for rank-deficient matrix. So a simple trick to pre-process of the inputs could help improve the accuracy and boost the speed up to 20 times. Pretty impressive. |
Moderator
|
Overflows leads to inf and then nan which are very slow for the CPU, thus the slowdowns. I think that the reason the Java version is working is because inf seems to be automatically replaced by the highest double number. The fact that this (expensive) trick is working here is just pure hazard IMO. Pivoting QR solver are more accurate because they are rank revealing, however they don't provide any additional benefits wrt overflow, and you particular case they are even worse.
I'm using gcc. |
Moderator
|
another reason might be the use of the FPU in Java instead of the SSE instruction set with the optimized fortran/C/C++ librairies. The FPU provide 80bits of precision for the intermediate computation, instead of 64bits with SSE.
|
Registered Member
|
I totally agree with you Gael. However, the fact that the same implementation of householderQR in Eigen 3.1 can solve the problem with no problem while failing in 3.0.5 requires some checking to ensure consistent between different version. On another matter, it seems that matrix multiplication with MKL performs worse than without. I have not done a comprehensive tests. However, I have measured Eigen gemm with GotoBlas2 and MKL gemm and found that with MKL support Eigen performs similar to that of Gotoblas2 and MKL. With no MKL support, Eigen performs better. The matrix size ranging from 30*30 to 5000*5000. What is the algorithmm of matrix multiplication in Eigen??
|
Moderator
|
These algorithms in 3.0.5 and 3.1 are exactly the same, of course, unless you call an external library like MKL.
Matrix products in Eigen are highly optimized and are similar to the one of GotoBLAS except that we have a single C++ template implementation of the kernel while GotoBLAS algorithms are written in assembly with one version for each pair of scalar type / micro-architecture. MKL should be significantly faster than pure Eigen only on CPU supporting AVX. See also this (old) benchmark: http://eigen.tuxfamily.org/index.php?title=Benchmark |
Registered Member
|
In EJML householder operations are normalized by the max absolute value in that column. This reduces numerical overflow/underflow and is a common technique. The added stability comes at the cost of some speed since it requires extra division operations. So if you know your matrix is well scaled (or to make benchmarks look better) it can be omitted.
Pseudo inverses uses SVD, which uses iteration to find singular values. The presence of NaN or Inf can cause it to get stuck in a loop until it reaches the maximum number of iterations. At which point most implementations will give up. Having said that, I know nothing of Eigen's internals so I can't make any educated guesses as to what's going on. |
Registered users: Bing [Bot], claydoh, Google [Bot], rblackwell, Yahoo [Bot]