Tensor operation optimisation • KDE Community Forums

This forum has been archived. All content is frozen. Please use KDE Discuss instead.

Board index

Tensor operation optimisation

Page 1 of 1 (3 posts)

Tags:

kenette Registered Member Posts 2 Karma 0	Tensor operation optimisation Wed Sep 23, 2015 4:58 pm I have been trying to use Eigen Tensor class to implement a convolutional neural network class in C++ Currently i try to beat an already existing implementation in OpenCV (which must be suboptimal compared to Eigen) But my implementation with Eigen is slower, could you give me some hint on the reasons why i could be slower: Pseudo code Code: Select all `Eigen::Tensor final_maps for o in kernel_tensor_dim_0: Eigen::Tensor output_maps for i in image_tensor_dim_0: K = extract_kernel(i) // dt_1 = 0.17 ms (with inits) I = extract_image(i)// dt_2 = 0.01 ms convolution_map = I * K //dt_3 = 0.01 ms output_maps.slice(i) = convolution_map //dt_4 =1.6 ms final_maps(o)=output_maps.sum(dim_0)` Here this set of operation is done around 10^4 times so the second is reached whereas by using cv::Mat and std::map i reach only 500ms. Any clue for the reason of the bottleneck ? Would it be something with eval() ? I let the auto build a tensor expression hoping it would optimize itself alone ^^'. UPDATE : If i assign every sub result to a Eigen::Tensor, i get cost 1 : 0.28 ms cost 2 : 0.37 ms cost 3 : 0.8 ms cost 4 : 0.06 ms
ggael Moderator Posts 3447 Karma 19 OS	Re: Tensor operation optimisation Sat Sep 26, 2015 9:20 pm what are the dimensions and sizes of your tensors? Compiler version and options? Also make sure you declared your intermediate tensors outside the loops..
kenette Registered Member Posts 2 Karma 0	Re: Tensor operation optimisation Mon Sep 28, 2015 11:44 am The first set of operations do a convolution of 128 set of 3 kernels on the 3 input channels of the image. I define the tensors vars (intermediate variables) in the scope of the first loop (128 sets of kernels). Nothing unnecessary is assigned in the input channel loop (i tried to assign the vars outside the loop the Eigen::array but no difference) The slices of the tensors are 64x64 for the input and 5x5 for the kernels. The operation defined in the loop costs around 1 ms per iteration for a total of 0.145794 s The compilation is done with clang++ with flags std=c++0x -msse2 -O3 (i tried also -msse3 but no difference) The C++ code Code: Select all //for a given set of kernels o for(int j=0;j<input.dimension(0);j++){ //extract kernel for the channel Eigen::array<int, 4> koffsets = {j,0,0,o}; Eigen::array<int, 4> kextents = {1,int(current_conv_filter.dimension(1)),int(current_conv_filter.dimension(2)),1} Eigen::array<Eigen::DenseIndex, 2> one_dim({int(current_conv_filter.dimension(1)),int(current_conv_filter.dimension(2))}); kernel = input.slice(koffsets,kextents).reshape(one_dim); //extract corresponding channel Eigen::array<int, 3> ioffsets = {j,0,0}; Eigen::array<int, 3> iextents = {1,int(input.dimension(1)),int(input.dimension(2))} Eigen::array<Eigen::DenseIndex, 2> two_dim({int(input.dimension(1)),int(input.dimension(2))}); map = input.slice(koffsets,kextents).reshape(two_dim); //Do the conv op Eigen::array<ptrdiff_t, 2> dims({0,1}); convolved_map = map.convolve(kernel,dims); //combine on the current output map Eigen::array<int, 3> size_img = {1,int(current_output_map.dimension(1)),int(current_output_map.dimension(2))}; current_output_map.slice(ioffsets,size_img ) = convolved_map; }

Page 1 of 1 (3 posts)

Bookmarks

Who is online

Registered users: Bing [Bot], Google [Bot], q.ignora, watchstar