Registered Member
|
First, Eigen is great. Good work!
I have an application that I want to optimize with SSE and I think Eigen can help me. The docs are very sparse on how one would add a function along the lines of Matrix::sum() and Matrix::squaredNorm(). I see scalar_sum_op in Functors.h and its corresponding functor_traits class. How do I go about writing my own version of these? I'm not following the template metaprogramming. For starters, I'd like to write one that computes both the sum and the squared norm in a single pass through the Eigen::Matrix. Could someone give a quick overview of what's going on here? I'd be happy to help document this on the wiki once I understand what's going on. Also, is there a way to apply this SSE-optimized functor architecture to buffers of standard memory? For large buffers, it seems easy and inexpensive to handle the non-aligned ends of the buffer, too. (I wrote some templated SSE-intrinsic-based function-object code to try this out and it seemed to work, although not using the concepts that Eigen uses.) For context, I'm doing normalized cross correlation on images, which means I have a loop that gets run a lot that looks like this:
There are lots of ways to do this. One is to copy the image block out into a temporary Eigen::Matrix<float,d*d,1>, which is nice in that then I can use X = window.sum() and XX = window.squaredNorm(), and I see that these are producing SSE assembly code. I'd like to be able to assemble a few functors together and apply that to an image region, like an SSE-optimized version of boost::accumulators. Thanks. |
Moderator
|
I'm afraid there is no easy solution to your first question. The reason is that you would have to use mat.redux(functor()) and the redux logic is not generic enough to support what you want to achieve. To this end we would have to extend the functor mechanism such that the functor could provide a custom packet type for the intermediate computation that, in your case, would be something like a std::pair<Packet,Packet>.
Assuming we do that, then you can easily follow the scalar_sum_op functor example and in particular specialize the returned scalar type to be a std::pair<Scalar,Scalar>, and implement the three methods: typedef std::pair<Scalar,Scalar> result_type; typedef std::pair<Packet,Packet> IntermediatePacketType; // not supported yet result_type operatro()(result_type, Scalar) { return ...; } IntermediatePacketType packetOp(const IntermediatePacketType&, Packet&) { return ... } result_type predux(cost IntermediatePacketType& ) { return ...; } The functor_traits class is used to tell the approximate cost of a functor and whether it can be vectorized or not. While writing that I realize that the redux logic currently does not honor the result_type provided by the functor, so this something that has to be fixed too. I'm not sure about the IntermediatePacketType I proposed here, but that's just to give you an idea of what's needed. Such an addition would be useful for computing the min and max coeffs at the same time. So there seems to be use cases for that. You can also achieve this with a "visitor", see the visitor() function. We use visitors to compute the min (or max) and the respective index at once (e.g., x = mat.minCoeff(&i,&j) ). However, visitors are not vectorized. For the second question, you can use Map. See the documentation. Unaligned boundaries are taken into account by Eigen for you. |
Registered Member
|
OK... so one way to do it would be a hack in which the function object has a hidden state -- it is passed by reference (const reference, so this would be hackish) so in principal I could have a function object with a hidden state so I could do
I'll try to pick apart the metaprogramming inside of redux, particularly what this is actually doing:
|
Moderator
|
yes that could work too, the functor could still perform the sum as usual, and behind the scene it would accumulate other information.
regarding: internal::redux_impl<Func, ThisNested>::run(derived(), func); we call an extra function to pick the right algorithm depending on the input matrix and functor ability (unrolling, vectorization, etc.) |
Registered Member
|
I suppose another approach would be to mirror the behavior of std for_each, which takes a functor by value and returns it.
I'll look into the implementation details this more and maybe throw something up on the wiki documenting how it works now. |
Registered users: Bing [Bot], Google [Bot], Yahoo [Bot]