Registered Member
|
Hi I wrote this small function for doing convolution (basically returns a valid convolved matrix)
The results are valid at-least according to my tests. So for getting a gradient of an image using sobel filter i can now do the following:
However I am obtaining an unexpected behavior while using doing convolution with a separable kernel. As mentioned in the above articles, a sobel filter is separable (rank 1), so I should see a speedup if i do two 1D convolutions instead:
But instead I see almost 2X slowdown compared to the previous single convolve call.. Is there something wrong with my function? |
Moderator
|
Make sure you benchmarked with compiler optimization ON. Then my guess is that your application is memory bound and so you do not see the benefits of slightly fewer floating point operations but instead you see the cost of two passes on memory. Try on a small enough image so that it fits in L1 cache. If that helps, then you might process the big image block per block.
|
Registered users: Bing [Bot], Google [Bot], Yahoo [Bot]