Registered Member
|
So just first let me put some background on to it in order to make it more clear what my aim is - maybe someone can give me a good advice of how to do this even better.
I'm working in Machine Learning research and one of the bottlenecks we found so far is deriving models gradients, Hessians and etc by hand. Thus recently I decided to write an autodiff package. Such thing exists in Theano, and in Torch (for Lua) none the less of my own interest I wanted to make it use Eigen (well I do not have a GPU, so actually so far Eigen + MKL has always been the best HPC for me). Now, I do understand (very very vaguely though) on a higher scale what do eigen does, however I'm not so experience in C++ templates to be able to grasp how exactly it is achieved (let's say I get the simple idea, but all the stuff combined will need a course to understand it to the bits and bolts). Now for ML I'm doing something similar - optimising a compute graph, but however dynamically. This means you build a tree of computation, you specify your inputs and outputs and than I give a function which computes (and it uses Eigen). Now I have 2 choices how to achieve this: 1. Code generation - this is relatively easy and would produce the best high performing code as both Eigen magic and the compiler will work to do the computation optimal. 2. Generate am Evaluator dynamically, which sorts is gona be a struct which know all nodes of the tree, have lambdas for each operator, have "registers" for each variable(e.g. it has the required variables inside it based on the graph to be populated) and now what to returns. The problem which I'm considering is that optimising the compute tree would involve function recursion of lambdas. E.g. the expression tanh(tanh(W*b)+c) would be in fact a call to 4 lambdas, each performing a single operation. Before I spent my time doing it - I was wondering would the lambdas force on each return to generate a temporary, or potentially the compiler would just inline them and Eigen lazy evaluation an kick in. Also if someone can a thing a better way of doing it please advise! Currently I'm doing simple codegeneration, as this even allows it to port to other languages and potentially later in future CUDA code. |
Registered users: Bing [Bot], Google [Bot], Yahoo [Bot]