Registered Member
|
I am stuck with a G5 Mac at work and would *love* to find an Xcode+Eigen project somewhere that has AltiVec enabled successfully. So far, I have not been able to get everything right although, without AltiVec, I have no problems. Then, it is straightforward Objective-C++ (Cocoa).
Does anyone know of any sample code online that I could peruse? Thanks in advance. N.B. I am using Eigen v2.0.11 if that matters. |
Registered Member
|
What CPU and compiler (version) are you using?
What specific problems do you have, e.g. compilation errors? Crashes?
Join us on Eigen's IRC channel: #eigen on irc.freenode.net
Have a serious interest in Eigen? Then join the mailing list! |
Registered Member
|
I am using GCC 4.2.
All I was looking for was an Xcode example so that I could see the various build settings in that context. I am not getting crashes. Also, there was no mention of the Accelerate framework which I found confusing. |
Registered Member
|
GCC 4.2 sounds good, it's the minimal GCC version for vectorization.
can you run this test program:
If you get "not detected" can you pass the -maltivec option to GCC and retry? There should be an option in Xcode for doing that.
Join us on Eigen's IRC channel: #eigen on irc.freenode.net
Have a serious interest in Eigen? Then join the mailing list! |
Registered Member
|
I tried two Xcode) projects: a C++ tool and a Cocoa-document app. In both cases, I used GCC 4.2 (not the default) and built Universal binaries targeting Leopard with the following header (since I never actually *installed* Eigen).
#include <iostream> #include "../eigen-2.0.11/Eigen/Core" I got altivec detected altivec enabled both times without any additional compiler flags needed. I guess what I'm missing is some example specifying all the alignment stuff with a simple matrix computation. Then I can follow that pattern. [Sorry to be such an Eigen newbie.] |
Registered Member
|
Eigen takes care of all the alignment stuff for you.
It looks as if AltiVec is perfectly enabled for you right now. In order to check the ASM yourself, see this: http://eigen.tuxfamily.org/index.php?ti ... #Using_GCC
Join us on Eigen's IRC channel: #eigen on irc.freenode.net
Have a serious interest in Eigen? Then join the mailing list! |
Registered Member
|
Since I got the aforementioned "AltiVec enabled" message, I wrote a function that should use AltiVec. It has one line, viz.,
u = v + 3*w; where all three variables are Vector4d. The assembly code does not show any evidence of AltiVec being used. If I enable AltiVec extensions in Xcode, the assembly does not change. If I try to NOT vectorize by defining EIGEN_DONT_VECTORIZE, the only thing that changes in the disassembly are some offsets which suggests that alignment only is being affected. Xcode does not have a -maltivec option (it is ignored) but does have a -faltivec option. However, it has no effect in this case. In addition, the debugger window showing the vector registers does not change when the line above executes. Here is the disassembly of my test function. It looks like just PPC assembly to me. 0x00001d18 <+0000> nop 0x00001d1c <+0004> nop 0x00001d20 <+0008> nop 0x00001d24 <+0012> nop 0x00001d28 <+0016> nop 0x00001d2c <+0020> mflr r0 0x00001d30 <+0024> stmw r29,-12(r1) 0x00001d34 <+0028> stw r0,8(r1) 0x00001d38 <+0032> stwu r1,-112(r1) 0x00001d3c <+0036> mr r30,r1 0x00001d40 <+0040> bcl- 20,4*cr7+so,0x1d44 <_Z3fooRN5Eigen6MatrixIdLi4ELi1ELi2ELi4ELi1EEES2_S2_+44> 0x00001d44 <+0044> mflr r31 0x00001d48 <+0048> stw r3,136(r30) 0x00001d4c <+0052> stw r4,140(r30) 0x00001d50 <+0056> stw r5,144(r30) 0x00001d54 <+0060> lwz r29,140(r30) 0x00001d58 <+0064> addis r2,r31,0 0x00001d5c <+0068> addi r2,r2,13540 0x00001d60 <+0072> lfd f0,0(r2) 0x00001d64 <+0076> stfd f0,56(r30) 0x00001d68 <+0080> lwz r2,144(r30) 0x00001d6c <+0084> addi r0,r30,64 0x00001d70 <+0088> mr r3,r0 0x00001d74 <+0092> addi r0,r30,56 0x00001d78 <+0096> mr r4,r0 0x00001d7c <+0100> mr r5,r2 0x00001d80 <+0104> bl 0x49bc <dyld_stub__ZN5EigenmlERKdRKNS_10MatrixBaseINS_6MatrixIdLi4ELi1ELi2ELi4ELi1EEEEE> 0x00001d84 <+0108> addi r2,r30,64 0x00001d88 <+0112> addi r0,r30,76 0x00001d8c <+0116> mr r3,r0 0x00001d90 <+0120> mr r4,r29 0x00001d94 <+0124> mr r5,r2 0x00001d98 <+0128> bl 0x39f4 <_ZNK5Eigen10MatrixBaseINS_6MatrixIdLi4ELi1ELi2ELi4ELi1EEEEplINS_12CwiseUnaryOpINS_21ei_scalar_multiple_opIdEES2_EEEEKNS_13CwiseBinaryOpINS_16ei_scalar_sum_opIdEES2_T_EERKNS0_ISC_EE> 0x00001d9c <+0132> addi r0,r30,76 0x00001da0 <+0136> lwz r3,136(r30) 0x00001da4 <+0140> mr r4,r0 0x00001da8 <+0144> bl 0x3ea0 <_ZN5Eigen6MatrixIdLi4ELi1ELi2ELi4ELi1EEaSINS_13CwiseBinaryOpINS_16ei_scalar_sum_opIdEES1_NS_12CwiseUnaryOpINS_21ei_scalar_multiple_opIdEES1_EEEEEERS1_RKNS_10MatrixBaseIT_EE> 0x00001dac <+0148> lwz r0,136(r30) 0x00001db0 <+0152> addis r2,r31,0 0x00001db4 <+0156> lwz r3,17956(r2) 0x00001db8 <+0160> mr r4,r0 0x00001dbc <+0164> bl 0x49ac <dyld_stub__ZN5EigenlsINS_6MatrixIdLi4ELi1ELi2ELi4ELi1EEEEERSoS3_RKNS_10MatrixBaseIT_EE> 0x00001dc0 <+0168> lwz r1,0(r1) 0x00001dc4 <+0172> lwz r0,8(r1) 0x00001dc8 <+0176> mtlr r0 0x00001dcc <+0180> lmw r29,-12(r1) 0x00001dd0 <+0184> blr |
Registered Member
|
OK, just 2 things:
EIGEN_DONT_VECTORIZE doesn't affect alignment at all, we align in both cases. EIGEN_DONT_ALIGN would disable alignment. Can you enable optimization (-O2): this will make the assembly much much shorter and easier to read.
Join us on Eigen's IRC channel: #eigen on irc.freenode.net
Have a serious interest in Eigen? Then join the mailing list! |
Registered Member
|
Here is the function at source code level
void foo(Vector4d &u, Vector4d &v, Vector4d &w) { u = v + 3*w; cout << u; } Here is the disassembly with -O2 (there is also -O3). I see only regular registers and floating-point registers. 0x000025b0 <+0000> mflr r0 0x000025b4 <+0004> lfd f0,0(r4) 0x000025b8 <+0008> lfd f13,0(r5) 0x000025bc <+0012> bcl- 20,4*cr7+so,0x25c0 <_Z3fooRN5Eigen6MatrixIdLi4ELi1ELi2ELi4ELi1EEES2_S2_+16> 0x000025c0 <+0016> mr r9,r4 0x000025c4 <+0020> mr r4,r3 0x000025c8 <+0024> mflr r10 0x000025cc <+0028> nop 0x000025d0 <+0032> nop 0x000025d4 <+0036> mtlr r0 0x000025d8 <+0040> nop 0x000025dc <+0044> nop 0x000025e0 <+0048> nop 0x000025e4 <+0052> addis r2,r10,0 0x000025e8 <+0056> lfs f12,6116(r2) 0x000025ec <+0060> mr r2,r3 0x000025f0 <+0064> addis r3,r10,0 0x000025f4 <+0068> lwz r3,7624(r3) 0x000025f8 <+0072> fmadd f13,f13,f12,f0 0x000025fc <+0076> stfd f13,0(r2) 0x00002600 <+0080> lfd f13,8(r9) 0x00002604 <+0084> lfd f0,8(r5) 0x00002608 <+0088> fmadd f0,f0,f12,f13 0x0000260c <+0092> stfd f0,8(r2) 0x00002610 <+0096> lfd f0,16(r9) 0x00002614 <+0100> lfd f13,16(r5) 0x00002618 <+0104> fmadd f13,f13,f12,f0 0x0000261c <+0108> stfd f13,16(r2) 0x00002620 <+0112> lfd f0,24(r5) 0x00002624 <+0116> lfd f13,24(r9) 0x00002628 <+0120> fmadd f0,f0,f12,f13 0x0000262c <+0124> stfd f0,24(r2) 0x00002630 <+0128> b 0x376c <dyld_stub__ZN5EigenlsINS_6MatrixIdLi4ELi1ELi2ELi4ELi1EEEEERSoS3_RKNS_10MatrixBaseIT_EE> |
Registered Member
|
Indeed, this code is not vectorized :/
I'm not a altivec expert but this is very strange. Can you try just "u=v+w" and see if that works?
Join us on Eigen's IRC channel: #eigen on irc.freenode.net
Have a serious interest in Eigen? Then join the mailing list! |
Registered Member
|
Deleting the matrix output just deletes the last line of the disassembly above and changes the penultimate line from b to blr. Also, as noted, manually enabling -faltivec, in the Xcode build options, has no effect.
I do not know how Eigen interacts with the extended GCC 4.2 that ships with Macs these days. However, I dumped the following from the Apple-supplied man page for gcc. ***** -faltivec This flag is provided for compatibility with Metrowerks CodeWarrior and MrC compilers as well as previous Apple versions of GCC. It causes the -mpim-altivec option to be turned on. -maltivec -mno-altivec Generate code that uses (does not use) AltiVec instructions, and also enable the use of built-in functions that allow more direct access to the AltiVec instruction set. You may also need to set -mabi=altivec to adjust the current ABI with AltiVec ABI enhancements. -mpim-altivec -mno-pim-altivec Enable (or disable) built-in compiler support for the syntactic extensions as well as operations and predicates defined in the Motorola AltiVec Technology Programming Interface Manual (PIM). This includes the recognition of "vector" and "pixel" as (context- dependent) keywords, the definition of built-in functions such as "vec_add", and the use of parenthesized comma expression as AltiVec literals. Note that unlike the option -maltivec, the extension does not require the inclusion of any special header files; if "<altivec.h>" is included, a warning will be issued and the contents of the header will be ignored. The preprocessor shall provide an "__APPLE_ALTIVEC__" manifest constant when -mpim-altivec is specified. (APPLE ONLY) In addition, the -mpim-altivec option disables the inlining of functions containing AltiVec instructions into functions that do not make use of the vector unit. Certain other optimizations, such as inline vectorization of "memset" and "memcpy" calls, are also disabled. These adjustments make it possible to compile programs whose use of AltiVec instructions is preceded by a run-time check for the presence of AltiVec functionality, and that can therefore be made to run on G3 processors. Note that all of these optimizations may be re-enabled by supplying the -maltivec option, or an -mcpu option specifying a processor that supports AltiVec instructions. ***** I tried turning off -faltivec and using -maltivec instead. The result was the same: no AltiVec code. In all of this, I started with a clean project template. There is nothing special that I did to it. |
Registered Member
|
Just discovered that, if I change my Vector4d variables to Vector4f, then i *do* get AltiVec to work. Here is the disassembly:
***** 0x00002650 <+0000> mfvrsave r0 0x00002654 <+0004> nop 0x00002658 <+0008> nop 0x0000265c <+0012> nop 0x00002660 <+0016> stw r0,-8(r1) 0x00002664 <+0020> oris r0,r0,49164 0x00002668 <+0024> nop 0x0000266c <+0028> nop 0x00002670 <+0032> mtvrsave r0 0x00002674 <+0036> mflr r0 0x00002678 <+0040> lvx v12,r0,r5 0x0000267c <+0044> lvx v13,r0,r4 0x00002680 <+0048> bcl- 20,4*cr7+so,0x2684 <_Z3fooRN5Eigen6MatrixIfLi4ELi1ELi2ELi4ELi1EEES2_S2_+52> 0x00002684 <+0052> lwz r12,-8(r1) 0x00002688 <+0056> mflr r10 0x0000268c <+0060> mtlr r0 0x00002690 <+0064> addis r2,r10,0 0x00002694 <+0068> lfs f0,2152(r2) 0x00002698 <+0072> addi r2,r1,-48 0x0000269c <+0076> stfs f0,-48(r1) 0x000026a0 <+0080> lvx v0,r0,r2 0x000026a4 <+0084> vspltw v1,v0,0 0x000026a8 <+0088> vspltisw v0,0 0x000026ac <+0092> vmaddfp v0,v12,v1,v0 0x000026b0 <+0096> vaddfp v0,v0,v13 0x000026b4 <+0100> stvx v0,r0,r3 0x000026b8 <+0104> mtvrsave r12 0x000026bc <+0108> blr ***** I see the vector registers changing as well. Sadly, I really need doubles for my app. |
Registered Member
|
Oh, right!
AltiVec only supports floats, not double, if I remember well. At the very least I can tell you that _our_ AltiVec support is only for floats, as you can see in the file Eigen/src/Core/arch/AltiVec/PacketMath.h
Join us on Eigen's IRC channel: #eigen on irc.freenode.net
Have a serious interest in Eigen? Then join the mailing list! |
Registered Member
|
Eigen doesn't use the Accelerate Framework (as far as I can see). Wouldn't this be a problem when targeting a universal binary?
|
Registered Member
|
FWIW, the Apple docs state that their vectorized BLAS library, optimized using ATLAS, *does* fully implement BLAS so (presumably) it can handle doubles and complex numbers as well. I have not tried to confirm this since I do not know how to penetrate their dylib in the Xcode debugger (gdb).
This could be something to keep in mind when you are doing your comparison tests for Eigen vs. other libraries. |
Registered users: abc72656, Bing [Bot], daret, Google [Bot], Sogou [Bot], Yahoo [Bot]