Registered Member
|
I have been trying to use Eigen Tensor class to implement a convolutional neural network class in C++
Currently i try to beat an already existing implementation in OpenCV (which must be suboptimal compared to Eigen) But my implementation with Eigen is slower, could you give me some hint on the reasons why i could be slower: Pseudo code
Here this set of operation is done around 10^4 times so the second is reached whereas by using cv::Mat and std::map i reach only 500ms. Any clue for the reason of the bottleneck ? Would it be something with eval() ? I let the auto build a tensor expression hoping it would optimize itself alone ^^'. UPDATE : If i assign every sub result to a Eigen::Tensor, i get cost 1 : 0.28 ms cost 2 : 0.37 ms cost 3 : 0.8 ms cost 4 : 0.06 ms |
Moderator
|
what are the dimensions and sizes of your tensors? Compiler version and options? Also make sure you declared your intermediate tensors outside the loops..
|
Registered Member
|
The first set of operations do a convolution of 128 set of 3 kernels on the 3 input channels of the image.
I define the tensors vars (intermediate variables) in the scope of the first loop (128 sets of kernels). Nothing unnecessary is assigned in the input channel loop (i tried to assign the vars outside the loop the Eigen::array but no difference) The slices of the tensors are 64x64 for the input and 5x5 for the kernels. The operation defined in the loop costs around 1 ms per iteration for a total of 0.145794 s The compilation is done with clang++ with flags std=c++0x -msse2 -O3 (i tried also -msse3 but no difference) The C++ code
|
Registered users: Bing [Bot], Google [Bot], q.ignora, watchstar