|   Registered Member   
 | 
							I am looking to move to Eigen. I am currently using own code based on LAPACK and am using CILK++ to parallelize chunks of code. As I need to move to a template approach, I do not want to reinvent the wheel and Eigen looks like it leads the pack. For now, I was looking at stability issues. Where is the block of code that computes the vector norm? Specifically, I want to ensure that Eigen can handle massively varying ranges of numbers so that the program does not square numbers which are too big or too small. I am hoping that Eigen does not compute the squaredNorm as a simple sum of squares without some appropriate scaling. My preference is Blue's algorithm which groups vector elements rather than the continuous scaling employed by *LASSQ of LAPACK or the highly convoluted *NRM2 of the original BLAS Level1. Many Thanks - D. P.S. Can somebody explain the usage of topic icons. | 
|   Moderator   
 | 
							Both the squaredNorm() and norm() functions are tailored for high performance and therefore they simply compute a sum of square. For stability we have a stableNorm() functions which currently is implemented using a home made hypot function: 
 Our implementation of the hypot function is in Eigen/src/Core/MathFunctions.h: 
 Of course if you have something better to propose, there is no problem to change the implementation of stableNorm() for it. | 
|   Registered Member   
 | 
							Thanks for the quick reply. Do I need a special version? I cannot even find the string 'stableNorm' in 'eigen' anywhere? I cannot find the body of squared Norm anywhere either or does it materialize from expressions. I was looking to see how easy it would be to integrate James L Blue's algorithm from ACM TOMS V4, No 1, pp15-23. I also use the same algorithm to protect the computation of Given's rotations. I am curious why you rolled your own 'hypot'. In what environments did you note the problems? Did you find the IEEE 754 routines in  the *nix/*BSD problems. The hypot routine is too heavy handed for my taste. In your disabled 'Householder.h' code, you refer to the problem of relative element sizes which is what I need to address. Thanks - Damian | 
|   Registered Member   
 | 
							stableNorm() is only in the development version (the default branch) that will become 2.1. squaredNorm() and norm() are defined in Eigen/src/Core/Dot.h The givens rotations and householder transformations stuff will eventually be redone as we rewrite the corresponding algorithms. But that takes time, as we first need to nail the situation with diagonal/bidiagonal/tridiagonal matrices (that is happening in the fork eigen2_special_matrices). Be patient, that'll be in 2.1. 
								Join us on Eigen's IRC channel: #eigen on irc.freenode.net
 Have a serious interest in Eigen? Then join the mailing list! | 
|   Registered Member   
 | 
							Thanks for that tip. Actually what I wanted was in MathFunctions.h. I had totally missed it previously while trying to sift through lines and lines of grep output on an 80-column window. Maybe I need a nice new 24 inch screen on which to view Eigen. To achieve the robustness I want, I need to evaluate one of three branches of an if/then/else-if/else construct within a single pass through a loop. It does not look easy. But then my hand-rolled library which is a mixture of LAPACK and LINPACK and other stuff is too messy. Time to think a bit more. Thanks again - Damian | 
|   Moderator   
 | 
							In the devel branch I added a blueNorm() function which is basically a copy/paste of the code of the paper you pointed to us. In order to compare it with the current stableNorm() I built a vector of 10000000 elements all having the same value (in order to be able to compute a reference value). With float, if there is no underflow or overflow, the most accurate solution is clearly the basic norm() function (this is probably because of the vectorization). The stableNorm() function is pretty bad. With double the situation is quite different: the stableNorm() is clearly the better alternative, especially with under or overflow. | 
|   Moderator   
 | 
							now, let's compare the relative performance: float: std norm = 0.0358682 stable norm = 1.0819 blue's norm = 0.0592282 double: std norm = 0.0193479 stable norm = 0.632721 blue's norm = 0.0483868 the values correspond to the time in seconds to compute the norm of a vector of 2e7 elements. So the blue's norm might be a good compromise. | 
|   Registered Member   
 | 
							You mentioned, 
 Thanks for entertaining the idea. I find that hypot is too slow in any situation and that Blue's method was as accurate as hypot. What accuracy did you see with Blue's method even with under or overflow, or even none? There are other tricks with scaling as used in SNRM2 and SLASSQ of BLAS/LAPACK but the goto-riddled SNRM2 defies translation to C++ and SLASSQ is still quite slow, especially in a vector filled with monotonically increasing elements. I prefer to think that a robust solution is never a compromise. While one can often guarantee than no element will overflow in a matrix once creates directly, it is often harder to guarantee that very small, but still quite important numbers will not exist, especially if one is looking at naturally singular/unstable problems such as structural buckling/plasticity. They can often occur with numerically less rigorous finite element models that, in the absence of nobody having developed something better, are the only way to get answers to practical problems. And, while it is only my 2c contribution, I would be tempted to use the robust method with the default name 'norm' and then force people to use something called 'fastnorm' for the one which was less robust. You probably may need to do one Blue step to robustly compute an elementary Given's rotation in that Householder code you noted that you are working on at the moment. While LAPACK does not use Blue's method for their NORM, they effectively reinvented it, but never credited him, for its use within SLARTG as a way to overcome some bad behaviour in SROTG. Thanks again - Damian P.S. How do I postpone a posting and then come back to it? I can Save it but not retreive it. Also, I tried to register for the mailing list and my Firefox goes nowhere, i.e. it does nothing. Is the mailing list closed at the moment? | 
|   Registered Member   
 | 
							Ignore my comment about the mailing list. Brain fade problem on my part. Also, with respect to a stable computer of the Euclidean norm, while I am a fan of Blue's original method, I am really curious why you used the 'hypot' method of Moler instead of the Hammarling's continuous scaling method used in LAPACK which is more recent than the idea of Moler. - Damian | 
|   Moderator   
 | 
							if you want to try it yourself, I've commited in the devel branch a benchmark of various norm implementations including the lapack's algorithm and an SSE implementation of Blue's norm: cd pathtoeigen/bench g++ -msse2 -mfpmath=sse bench_norm.cpp -O2 -DNDEBUG -I.. -o bench_norm && ./bench_norm | 
|   Moderator   
 | 
							Slso note that another criteria is the use of the SSE instruction set versus the FPU. It is now safe to assume that by default, the SSE instructions are uses, even is the code is not vectorized. However, for this kind of computation, using the FPU yields much more accurate results. So one question is whether stableNorm (regardless of its implementation) should use the FPU or not ? This can be enforced by casting the matrix to "long double". Currently one can achieve that by writing: v.cast<long double>().stableNorm() in practice, for x86/x86_64 CPU the cast has no effect but disabling the use the SSE instr. in favor of the old FPU. edit: a problem of the lapack solution is the use of a div which is very very slow (3 times slower than blues's norm) | 
|   Moderator   
 | 
							ok, so finally I've also tried the basic two passes algorithm: Scalar s = v.cwise().abs().maxCoeff(); return s*(v/s).norm(); Thanks to the vectorization it is actually faster than other "stable norm" algorithm, especially when the data are in-cache. So in order to optimize it for large vectors, I've implemented a block version which is in-between this naive algorithm and the lapack's continuous scaling one. The result looks very nice (cheap and accurate), so I'm going to make it the default stableNorm()  edit: well it is fast only with vectorization, otherwise Blue's algorithm is still much faster, so I'm still unsure... | 
|   Registered Member   
 | 
							Your idea of first finding the maximum coefficient and then scaling by this number is close to what Ed Anderson did and documented very thoroughly in his 2002 publication about performance changes to LAPACK-3. He was much more careful about the safety of dividing by the maximum coefficient when that number is small. Unfortunately, these changes, called LAPACK-3E did not get rolled into the main LAPACK branch and have stagnated. I will send you the LAWN reference later but a lot of changes were driven by, but not restricted to, vectorization, an important issue for his employer at the time, CRAY Research. The performance gains he saw reflect exactly what you see. It was a very thorough document. Unfortunately, while he referenced Blue's paper, he did not read Blue's original paper until very recently when I brought it to his attention. Then he too noted the same issues as you are seeing. There is also a variant of Blue's method called ENORM which is used in MINPACK. As in Blue's original, it uses the raw-number in the mid-range. But it uses continuous scaling in the high and low ranges which means it needs less machine constants than standard Blue. I have not done any tests on this one because I try and avoid those slooow divisions. But it has no cache sensitivity as you have noted with the Anderson pre-scaling version. While cache-sensitivity is not such a big issue on say a hot Xeon with 6MB of cache shared between each 2-cores, on an AMD with only 512Kb or 1MB of cache, it is far more crippling which is why I avoid it at all costs as the vector sizes I work with often fall outside 1MB. On a machine without SSE2, such as the one I use for my own development until it gets replaced shortly, the optimized (register) accumulation will always use the extended, i.e. 40 and 80 bit, float and double, x86, or is it 8087-style, arithmetic. You get an automated handling of the underflow issues so you can avoid one of the comparisons which gains you a bit. It does nothing for the numbers which overflow when squared. But all the error analyses of, and logic behind, the LINPACK and LAPACK algorithms since the 1980s assume that registers have the same precision as numbers in memory. And while I love this concept of some extended precision, and X86s are prolific, multi-core SPARC and other CPUs are in use on many systems. So trying to mix and match is a bit dangerous if you want maximum portability. By the way, I normally use '-03' because I find that, unlike '-O2', it, well 4.whatever gcc, computes numbers like const double SQRTEPSILON = sqrt(DBL_EPSILON) at compile time. Not sure if that helps 'eigen'. - Damian | 
|   Registered Member   
 | 
							Just for the sake of completeness, there is a tiny bug in Blue's original implementation where overflow is not avoided. To fix, just scale the upper threshold by the vector length. This is fixed in the implementations done shortly thereafter, i.e.  ENORM of MINPACK and SNRM2 in BLAS/LINPACK and it is also used in SLASSQ of LAPACK-3E. In theory you should scale the lower limit by the vector length but you can ignore it as it falls out in the wash when sums are recombined at the end. There is a further 'improvement' to the original which is only implemented in SNRM2. At the expense of one extra comparison per loop, you can avoid the computations of sum of squares in the lowest range once any vector element has been seen in the highest range because you know that the sum of the small components will never be used at recombination time. Unless you are feeling particularly bored, I suggest you ignore it as it does not buy you a lot even in the best case. It just complicates your life especially if you are trying to vectorise things. - Damian | 
Registered users: Bing [Bot], Google [Bot], Sogou [Bot]
 
		 
		 
		 
		