This forum has been archived. All content is frozen. Please use KDE Discuss instead.

Strange performance behaviour

Tags: None
(comma "," separated)
wingsit
Registered Member
Posts
28
Karma
0
OS

Strange performance behaviour  Topic is solved

Wed Jun 29, 2011 6:54 pm
Here is an observation while implementing bfgs nonlinear optimiser. It is just a prototype code and use VectorXD and MatrixXD has intermediate type.

The prototype basically are doing the following

1) compute the numerical hessian
2) get the inverse by ldlt.solve(Identity(n,n))
3) go into BFGS loop and using the inverse BFGS update, and matrix multiplication to get next location

I am just doing some preliminary test on ubuntu with GCC 4.4.

When I compile with gcc -O3 the timing result for solving 100 variable extended rosenbrock problem is about 1.5s but gcc -O3 -msse took about 35 seconds.

When I have time I will try to reduce it to a test case. However does anyone know what might have been the problem?
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS

Re: Strange performance behaviour

Thu Jun 30, 2011 7:16 pm
with -msse only, Eigen does nothing different. We start to vectorize with -msse2. Also, are you sure you're not already running a 64 bits system? (uname -a will tell you)
wingsit
Registered Member
Posts
28
Karma
0
OS

Re: Strange performance behaviour

Thu Jun 30, 2011 7:23 pm
32 bit. By using -msse2 the time dropped almost 50% (expected)! Any idea why -msse would actually slow down the performance? (Actually this question is unimportant to me now)
caddie
Registered Member
Posts
1
Karma
0

Re: Strange performance behaviour

Sun Jul 17, 2011 5:41 am
The real question is, why would you select Katmai as the target architecture?

To be honest, it is hard to answer <your question> w/o proper profiling, and further knowledge on the hardware / optimization flags, etc., but I guess:
- Eigen (as it has been mentioned) does not vectorize when target instruction set is SSE(1)
- I presume that -msse instructs gcc to select PIII (Katmai?) as the target arch, and it is known that tuning to PIII has an adverse effect on the instruction flow / execution unit utilization on newer processors (which is probably even more adverse than tuning to Willamette and running on Penryn or older arch).

If you are really interested, do an event-based profiling on both binary, check the ASM and that should point you to the right direction.


Bookmarks



Who is online

Registered users: bartoloni, Bing [Bot], Google [Bot], Yahoo [Bot]