This forum has been archived. All content is frozen. Please use KDE Discuss instead.

How to reduce memory usage of the following code?

Tags: None
(comma "," separated)
jhuling
Registered Member
Posts
2
Karma
0
The general setup is that I have a row-major major X (usually the number of rows in X can be millions and number of columns on the order of hundreds) and I want to compute X^TX = X_1^TX_1 + ... + X_nfolds^TX_nfolds in addition to computing and storing each of X_1^TX_1, ..., X_nfolds^TX_nfolds. where each of X_k is a submatrix of X which is not contiguous in memory (ie it is comprised of rows of X which are definitely not next to each other)

I have code which does accomplish this. However, it causes a huge usage of memory, much larger than the actual memory space of X. The code which does this (note that this is inside a class definition, so X is a row-major MatrixXd class member) is the following:

Code: Select all
    void XtX(std::vector<MatrixXd > &xtx_list_,
                  std::vector<VectorXd > &xty_list_,
                  std::vector<int > &nobs_list_) const {
       
       
        for (int k = 1; k < nfolds + 1; ++k)
        {
           
            VectorXi idxbool = (foldid.array() == k).cast<int>();
            int nrow_cur = idxbool.size();
            int numelem = idxbool.sum();
            VectorXi idx(numelem);
            int c = 0;
            for (int i = 0; i < nrow_cur; ++i)
            {
                if (idxbool(i) == 1)
                {
                    idx(c) = i;
                    ++c;
                }
            }
           
            // store subset of matrix X and response Y for this fold
            MatrixXd sub(numelem, nvars);
            VectorXd sub_y(numelem);
            for (int r = 0; r < numelem; ++r)
            {
                sub.row(r) = X.row(idx(r));
                sub_y(r) = Y(idx(r));
            }
           
            MatrixXd AtAtmp(MatrixXd(nvars, nvars).setZero().
                            selfadjointView<Lower>().rankUpdate(sub.adjoint() ));
           
            VectorXd AtBtmp = sub.adjoint() * sub_y;
           
            // store the X'X and X'Y of the subset
            // of data for fold k
            xtx_list_[k-1] = AtAtmp;
            xty_list_[k-1] = AtBtmp;
            nobs_list_[k-1] = numelem;
           
        }
    }


Does anyone have any recommendations for reducing or eliminating the extra memory usage? Thanks!
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
Except computing the A_i = X_i^T X_i on demand there is not much you can do. Compressed triangular storage could also half the memory usage, but that's not supported yet. You could emulate by storing A_i and A_i+1 into a nbvars+1 x nbvars matrix T_i/2 such that:

A_i <=> (T_i/2).topRows(nbvars).selfadjointView<Upper>()
A_i+1 <=> (T_i/2).bottomRows(nbvars).selfadjointView<Lower>()

Again, this will "only" halves the memory usage.
jhuling
Registered Member
Posts
2
Karma
0
Thanks, ggael. In my usage of the code, X^TX in general will be not much larger than 500x500 and thus memory isn't much an issue on that front. What I'm *really* worried about is unwanted copies of X, which can be many gigabytes
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
Where do you see copies of X? I don't see any in that code, except the numelem*nvars chunks, but I expect them to be quite small?

Looking at your code, you are storing nfolds matrices of size nvars*nvars, so if nvars==500, this represents about nfolds*2MB which can be large if nfolds is large...

If nfolds is small, but numelem too large, then you can still compute the A_i for which numelem is too large through smaller chunks.


Bookmarks



Who is online

Registered users: Bing [Bot], blue_bullet, Google [Bot], rockscient, Yahoo [Bot]