Registered Member
|
I have to solve about two-hundred million linear least squares problems over and over again, so I need to improve performance as far as possible.
I'm using LDLT solver (simply because I did not see a significant performance difference between LDLT and LLT, and LDLT seems to be more robust). My application is already multithreaded, so I'm compiling eigen with the following defines:
my (pseudo-) code looks something like that:
profiling my code, I see that something like 98% of time is spent on
So my question is, is there some faster way of doing this? something that bugs me is that although I'm using as many threads as I have cores, and even though profiling my code shows that all time is spent on solving the least squares problems, cpu usage reported in task manager is only about 30% (that is 30% on each core) on my i7 (4 cores, 8 hyperthreads) and only about 5% on my Dual Xeon Gold (24 cores, 48 hyperthreads). So my questions are: - is there a faster way to solve a series of similar problems (they all solve for 32 coefficients, but the number of data-points varies for each problem) - most of the problems have very similar solutions. Is it possible to get a faster solution by initializing the solver somehow using the solution of the last problem? - any idea what could be the cause for the code being in the solver for the most time, but cpu usage being only a fraction of available cpu usage? |
Registered Member
|
If you are theoretically getting the best performance possible out of your CPU and it isn't reaching high% usage, it may not be a limiting factor to time cost.
Intel suggests that applications with high thread counts may be limited by memory bandwidth and access. It recommends ways to profile and identify and identify memory saturation: https://software.intel.com/content/www/us/en/develop/articles/detecting-memory-bandwidth-saturation-in-threaded-applications.html On a separate note, consider solving the normal equation, ATAx = ATb, iteratively if you don't need exact solutions (you can specify it to have a very low error, e.g. 10^-15). You can initialize an iterative method with a good approximation of the solution. Depending on how good it is, you may need only a few iterations. It usually takes a performance test to find the best solver, according to PETSc (some other linear algebra library). |
Registered users: bartoloni, Bing [Bot], Google [Bot], Yahoo [Bot]