This forum has been archived. All content is frozen. Please use KDE Discuss instead.

SSE approximate inverse square root?

Tags: None
(comma "," separated)
mikaellund
Registered Member
Posts
7
Karma
0
OS
Hi,
I'm doing particle simulations and my current bottleneck are 1/sqrt(x) and 1/x operations. Is there a way to make Eigen use the SSE intrinsic estimates "rsqrtps" and "rcpps" for the former two operations?
Thanks,
Mikael
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
Be aware that these two instructions are very imprecise, and they require one or two Newton iterations for accuracy. Nevertheless it is very easy to add support for such instruction in Eigen. The first thing to do is to implement the appropriate functor. Here is an example:

Code: Select all
template<typename Scalar> struct fastinvsqrt_func {
  EIGEN_EMPTY_STRUCT_CTOR(fastinvsqrt_func)
  inline const Scalar operator() (const Scalar& a) const { return Scalar(1)/sqrt(a); }
  typedef typename packet_traits<Scalar>::type Packet;
  inline Packet packetOp(const Packet& a) const { return rsqrtps(a); }
};
template<typename Scalar>
struct functor_traits<scalar_sqrt_op<Scalar> >
{ enum {
    Cost = 2 * NumTraits<Scalar>::MulCost,
    PacketAccess =
    #ifdef EIGEN_VECTORIZE_SSE
    true
    #else
    false
    #endif
  };
};


and then you can do:

(a+b).unaryExpr(fastinvsqrt_func())

and/or add a global shortcut such that you can do:

fastinvsqrt(a+b);

and/or add a fastinvsqrt member function to ArrayBase (or even DenseBase if you wish) using our plugin mechanism, see:

http://eigen.tuxfamily.org/dox-devel/To ... MatrixBase
mikaellund
Registered Member
Posts
7
Karma
0
OS
Thanks a lot - I'll try it out!
mikaellund
Registered Member
Posts
7
Karma
0
OS
Hi again,
I tried the above suggestion, but - see code snippet below - I obviously didn't quite get it. The program compiles but prints 1,2,..10. Any help?

Code: Select all
#include <iostream>
#include <Eigen/Dense>
namespace Eigen {
  template<typename Scalar>
  struct fastinvsqrt_func {
    EIGEN_EMPTY_STRUCT_CTOR(fastinvsqrt_func)
    inline const Scalar operator() (const Scalar& a) const { return Scalar(1)/sqrt(a); }
    typedef typename ei_packet_traits<Scalar>::type Packet;
    inline Packet packetOp(const Packet& a) const { return rsqrtps(a); }
  };
  template<typename Scalar>
  struct ei_functor_traits<fastinvsqrt_func<Scalar> > {
    enum {
      Cost = 2 * NumTraits<Scalar>::MulCost,
      PacketAccess =
#ifdef EIGEN_VECTORIZE_SSE
      true
#else
      false
#endif
    };
  };
}

int main() {
  int n=10;
  Eigen::ArrayXf v(n);
  for (int i=0; i<n; i++) v[i]=i+1;
  v.unaryExpr( Eigen::fastinvsqrt_func<float>() );
  for (int i=0; i<n; i++) std::cout << v[i] << std::endl;
}
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
unaryExpr returns an expression, so you should assign it:

v = v.unaryExpr( Eigen::fastinvsqrt_func<float>() );

;)
mikaellund
Registered Member
Posts
7
Karma
0
OS
ahh, silly me. It's working now - thanks a lot. I attach the final snippet, should anyone be interested.

Code: Select all
#include <iostream>
#include <Eigen/Dense>
namespace Eigen {
  template<typename Scalar>
  struct invsqrt_op {
    EIGEN_EMPTY_STRUCT_CTOR( invsqrt_op )
    inline const Scalar operator() (const Scalar& a) const { return Scalar(1)/ei_sqrt(a); }
    typedef typename ei_packet_traits<Scalar>::type Packet;
    inline Packet packetOp(const Packet& a) const { return _mm_rsqrt_ps(a); }
  };
  template<typename Scalar>
  struct ei_functor_traits<invsqrt_op<Scalar> > {
    enum {
      Cost = 2 * NumTraits<Scalar>::MulCost,
      PacketAccess =
#ifdef EIGEN_VECTORIZE_SSE
      true
#else
      false
#endif
    };
  };
}

int main() {
  int n=10;
  Eigen::ArrayXf v(n);
  for (int i=0; i<n; i++) v[i]=i+1;
  v = v.unaryExpr( Eigen::invsqrt_op<float>() );
  std::cout << v;
}


Bookmarks



Who is online

Registered users: bartoloni, Bing [Bot], Evergrowing, Google [Bot], ourcraft