This forum has been archived. All content is frozen. Please use KDE Discuss instead.

Best memory strategy: large product of fixed size matrices

Tags: None
(comma "," separated)
jbromley
Registered Member
Posts
4
Karma
0
Hi,

I have a large number of small fixed size matrices (e.g. 4x4 or 16x16) that I want to multiply consecutively and only consecutively. Specifically I have N small matrices:

W_i where i = 1 ... N

and I wish to calculate every possible left and right partial product, i.e

L(j) = product(i=1 to j) W_i
and
R(j) = product(i=j to N) W_i
for all j.

Then I want to modify the W_i in place and repeat, again and again and again. So I don't expect the initial overhead of memory allocation to be my bottleneck but I do want the matrix multiplications to be as efficient as possible. Obviously this means ensuring alignment for vectorization. But am I also correct in assuming that since matrices are only ever multiplied consecutively it would be most efficient to have the matrix data stored consecutively in memory?

If this is the case, what is the best way to achieve this? Is it possible to just create a C-style array of Eigen Matrix objects:

Matrix4d W[N];

while still preserving alignment for vectorization? And if so would this guarantee the actual matrix data being consecutive in memory?

Or would it be better to do:

double * Wmemory = aligned_allocator<double>().allocate(N);

followed by replacing every reference to W[i] with a reference to Map<Matrix4d>(Wmemory+i*16*sizeof(double))? Although I'd prefer not to have to write something so cumbersome(!).

I'm also prepared to sacrifice portability and assume that the architecture is intel 64-bit (and so malloc should already return aligned pointers) if this would grant me a simpler syntax in my code. (e.g. in this case could I just create a valarray<Matrix4d>(N) and preserve alignment? And would it have the matrix data consecutive or would it be interspersed with internal variables from the Matrix4d class?).

Any ideas greatly appreciated...
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
Code: Select all
Matrix4d W[N];


is the simplest and best way to achieve what you want (alignment + sequential storage).
Hauke
Registered Member
Posts
109
Karma
3
OS
Just out of curiosity. You said that you have a large number of small matrices. How big is N? I am just asking since
Code: Select all
Matrix4d W[N];
will be located on the stack.

Regards,
Hauke
jbromley
Registered Member
Posts
4
Karma
0
The exact value of N will be allowed to vary (optimization problem). However it is anticipated that it will be in the region of 40. I will definitely also want to be doing the problem with 16x16 matrices though, not only 4x4 ones. And I don't yet know whether I'll need double precision. The L's and R's need storing too, but are in fact post and pre multiplied by a row/column vector so they are actually lists of vectors. However I will need two variants of each, and I'll also maybe need to make an entire copy of the data for 'trial functions' when choosing how to update the W's. So we're looking at around maybe

40x16x16xsizeof(double) bytes of storage for the W's (80KB)
40x16xsizeof(double) for each of the L's and R's (gives 5KB * 4)
+ a full copy of the data => 200KB total

Modern linuxes have a default stack of 8MB, I think(?). So hopefully should be okay. But if it turns out that I do want to explore much larger N, what would be the equivalent way to achieve alignment and sequential storage on the heap?
jbromley
Registered Member
Posts
4
Karma
0
Oh, and on the topic of stacks.

If I wrap my data inside a class/struct, e.g.

Code: Select all
class Sequence {
  public:
    ...
    EIGEN_MAKE_ALIGNED_OPERATOR_NEW
  private:
    Matrix4d W[40];
    ...
};


will W still be allocated on the stack if I instatiate an object of type Sequence on the stack? And what if I instantiate my object on the heap, presumably then the Eigen data will also be on the heap?

Lastly, should I be worried: is there a performance penalty (other than at allocation time) for performing matrix calculations on heap data vs stack data?
Hauke
Registered Member
Posts
109
Karma
3
OS
Regarding performance, you need to profile. There is only so much we can guess.

I would start with a
Code: Select all
Matrix4d W[40];
or
Code: Select all
std::array<Matrix4d,40> W;
. If that's enough space for your elements good, if not, you may need to switch to
Code: Select all
std::vector<Matrix4d,aligned_allocator<Matrix4d>> W;
.

Your wrapper won't help, the type still resides on the stack if you use it as is, unless you allocate on the heap and then you have the same result as with a regular std::vector.

- Hauke


Bookmarks



Who is online

Registered users: Bing [Bot], Google [Bot], Yahoo [Bot]