This forum has been archived. All content is frozen. Please use KDE Discuss instead.

Fastest way to compute a series of linear least squares

Tags: None
(comma "," separated)
mrmabulous
Registered Member
Posts
1
Karma
0
I have to solve about two-hundred million linear least squares problems over and over again, so I need to improve performance as far as possible.

I'm using LDLT solver (simply because I did not see a significant performance difference between LDLT and LLT, and LDLT seems to be more robust).
My application is already multithreaded, so I'm compiling eigen with the following defines:
Code: Select all
"EIGEN_MPL2_ONLY",
"EIGEN_NO_DEBUG",
"EIGEN_DONT_PARALLELIZE",
"EIGEN_UNALIGNED_VECTORIZE=0",
"EIGEN_MAX_ALIGN_BYTES=32",
"EIGEN_MAX_STATIC_ALIGN_BYTES=32",
"EIGEN_NO_AUTOMATIC_RESIZING"


my (pseudo-) code looks something like that:

Code: Select all
template <typename T>
using algn_vector = std::vector<T, Eigen::aligned_allocator<T>>;

void ProjectSparseSamples(
    const algn_vector<Vector3f>& dirs,
    const algn_vector<float>& values,
    algn_vector<float>* coeffs_out) {

  coeffs_out->resize(32);

  MatrixXf basis_values(dirs.size(), coeffs_out->size());
  Eigen::Map<VectorXf> func_values(const_cast<float*>(values.data()), dirs.size());

  for (unsigned int i = 0; i < dirs.size(); i++) {
     for(int idx = 0; idx < 32; idx++) {
        basis_values(i, idx) = SomeFunc(dirs[i], idx);
    }
  }

  MatrixXf t = basis_values.transpose();
  // profiler shows that almost all time is spent on the following line
  VectorXf soln = (t * basis_values).ldlt().solve(t * func_values);

  // Copy everything over to our coeffs array
  for (unsigned int i = 0; i < coeffs_out->size(); i++) {
    (*coeffs_out)[i] = soln(i);
  }
}

...
int main() {
  algn_vector<float> solution;
  algn_vector<Eigen::Vector3f> dirs;
  algn_vector<float> values;

  for(int i=0; i<num_problems; i++) {
     // fill dirs and values vectors with something between 50 and 500 values
    ProjectSparseSamples(dirs, values, &solution);
    // do something with the 32 solution values.
  }
}


profiling my code, I see that something like 98% of time is spent on
Code: Select all
VectorXf soln = (t * basis_values).ldlt().solve(t * func_values);


So my question is, is there some faster way of doing this?

something that bugs me is that although I'm using as many threads as I have cores, and even though profiling my code shows that all time is spent on solving the least squares problems, cpu usage reported in task manager is only about 30% (that is 30% on each core) on my i7 (4 cores, 8 hyperthreads) and only about 5% on my Dual Xeon Gold (24 cores, 48 hyperthreads).

So my questions are:

- is there a faster way to solve a series of similar problems (they all solve for 32 coefficients, but the number of data-points varies for each problem)
- most of the problems have very similar solutions. Is it possible to get a faster solution by initializing the solver somehow using the solution of the last problem?
- any idea what could be the cause for the code being in the solver for the most time, but cpu usage being only a fraction of available cpu usage?
andrew-dy
Registered Member
Posts
15
Karma
0
If you are theoretically getting the best performance possible out of your CPU and it isn't reaching high% usage, it may not be a limiting factor to time cost.

Intel suggests that applications with high thread counts may be limited by memory bandwidth and access.
It recommends ways to profile and identify and identify memory saturation:
https://software.intel.com/content/www/us/en/develop/articles/detecting-memory-bandwidth-saturation-in-threaded-applications.html

On a separate note, consider solving the normal equation, ATAx = ATb, iteratively if you don't need exact solutions (you can specify it to have a very low error, e.g. 10^-15).
You can initialize an iterative method with a good approximation of the solution. Depending on how good it is, you may need only a few iterations.
It usually takes a performance test to find the best solver, according to PETSc (some other linear algebra library).


Bookmarks



Who is online

Registered users: bartoloni, Bing [Bot], Google [Bot], Yahoo [Bot]