Registered Member
|
I'm working on a project where I have a vector of sorted doubles, and I need to take the exp() of each. Since the data was already in an Eigen Vector, I thought I'd just use the provided methods. Unfortunately, the double versions aren't vectorized using SSE (or not in the 3.1.0 I'm using), so I thought I'd write my own using the Cephes library and the float version as a guide. It worked, kind of. The difference between this and the exp() from math.h is small most of the time, but for some values it's an absolute difference of 262144, which is enough to result in NANs farther in the code. The largest errors are from the largest input values. For example, the error of 262144 is from an input value of 57.7482317672092...
Anyway, I was hoping that someone with more experience that I could point out my mistake. This is my first attempt at doing anything with SSE intrinsics, so it's likely I missed something I should have changed between float and double. Here's the code, from Eigen/src/Core/arch/SSE/MathFunctions.h
Then, in PacketMath.h in the same folder, I added the line HasExp = 1, to the packet_traits<double> struct. For reference, the Cephes library is http://www.netlib.org/cephes/ Edit: I should add that this is being run on a Xeon 5472 on Win XP x86 and is compiled with VS2010 on Ox and with SSE2 enabled. I'm pretty sure this is a logic error, but if not, here you go. If you need any other information, just ask. |
Moderator
|
Hi,
one problem are the hi and lo bounds that should be adjusted for double. |
Registered Member
|
Well, I changed those to the +-708 limits from the cephes file (is that what you meant?), but of course, that doesn't help the problem. Still getting the same difference.
|
Moderator
|
Indeed, the other issue is when computing 2^fx, you really must do it for double, not float. The following works here but we of course need a faster solution:
double dat[2]; pstore(dat, fx); dat[0] = std::pow(2,dat[0]); dat[1] = std::pow(2,dat[1]); return pmul(x,pload<Packet2d>(dat)); |
Registered Member
|
Fast is good. I'll take your word that it's faster, at least until I get the accuracy right and can properly test it with my data.
Any idea what's causing the errors? The only thing I can think of is not having precise enough constants. I copied them straight from the Cephes files, but maybe there's a conversion somewhere in the code. |
Moderator
|
hm, to be clear the piece of code I pasted is a replacement for the end of the pexp routine:
// build 2^n emm0 = _mm_cvttpd_epi32(fx); emm0 = _mm_add_epi32(emm0, p4i_0x7f); emm0 = _mm_slli_epi32(emm0, 23); return pmul(x, _mm_cvtps_pd(_mm_castsi128_ps(emm0))); which overflow for inputs greater than ~88. I have no clue whether this version is still faster than a sequential code, it was just to show the issue. |
Moderator
|
For the record here is a working version up to inputs between -88, 88:
|
Registered Member
|
Actually, that's not quite working. At least, not as you described, and not on my computer. It has a minimum input value of approximately -87.68 Anything lower than that causes it to return a zero, which is bad. I don't know how precise you can get, but -87.68 returns about 8.33e-39, and -87.69 returns 0. An upper value of 88 is close enough for my purposes, or at least it's not causing a divide by zero error.
Thank you for your help. I have it close enough, so I can use this. One last question. If you multiply a double array by a double, that operation is vectorized, right? |
Moderator
|
yes, that's what I said, it works for a limited input range. To fix this, we need to correct the last part of the routine which is the multiplication by 2^fx. For instance, you can replace this last part by:
// build 2^n double dat[2]; pstore(dat, fx); dat[0] = std::pow(2,dat[0]); dat[1] = std::pow(2,dat[1]); return pmul(x,pload<Packet2d>(dat)); to get correct results but at the cost of a big performance lost.... |
Registered Member
|
Ok, I've got 2^fx working.
This works for my cases, which don't come very close to the edge cases. You might want to check the bounds and make sure it doesn't do anything silly like return zero before you include this. |
Moderator
|
excellent:
https://bitbucket.org/eigen/eigen/chang ... 9d69a6827/ changeset: 3619d69a6827 user: ggael date: 2012-07-27 23:40:04 summary: add SSE pexp function for double, make use of _mm_floor_p* for pexp with SSE4.1 |
Registered Member
|
I am running into troubles with the new code. I've got input values such as x = -737.77991177461399
The old version resulted in: std::exp(x) = 3.858652694020e-321#DEN The new one results in: pexp(x) = -1.#INF000000000000 Do you see a way to fix this issue? Regards, Hauke |
Registered Member
|
Well, that's outside the bounds, so it should round. Unfortunately, the bounds that come from the Cephes methods don't seem to be exactly right. Try changing the minimum rounding bound (that's currently -7.08396418532264106224e2) upwards until it no longer fails. This is what I was worried about, unfortunately, I don't know enough about how the math works inside the computer to figure out the new bounds, or why it's not meeting the old. You'll probably just have to do it manually. Set up a small test program and fill a vector with -750 and use .exp() and keep changing the bounds until it works.
And I took my own advice and set up that small test program. I'm getting proper rounding behavior. for x = -737.77991177461399; Vec.setConstant(x); Vec.exp() = 2.22507e-308, which is the same as exp(-708....) the proper rounding. and exp(x) = 3.85865e-321 So, I have to say those dreaded words, "It works on my machine". Questions to ask. Are you using doubles? Did you modify the PacketMath.h in the same folder as the MathFunctions.h? You need to add the line HasExp = 1, or something like that (use the float version above it as an example), or it won't use the new function. |
Registered Member
|
The issue is fixed with the following commit:
https://bitbucket.org/eigen/eigen/chang ... 815a797c62 Now I am getting nice and clean zeros instead of INFs. Perfect. Thanks Gael! - Hauke |
Registered users: Baidu [Spider], Bing [Bot], Google [Bot]