This forum has been archived. All content is frozen. Please use KDE Discuss instead.

How to get better 'find similar' results

Tags: None
(comma "," separated)
Registered Member
Digikam looks like it could be a useful tool, but I've actually just downloaded and begun running entirely for one feature it has, the 'Find Similar' tool. I knew not to expect too much from this because image recognition is likely a difficult thing to get right, however the results I'm getting are pretty confusing and I wouldn't mind a little explanation of how to use it in case I'm doing it wrong, and why, under the circumstances I'm testing, it doesn't work when I would have thought it would.

I'm running this on Mac OS. The reason I'm testing this out is to try and help someone out with their unique situation. This person had a friend be a photographer for them for a photoshoot They took a lot of pictures, and they provided my friend with a folder of raw image files and a folder of JPEGs which they'd personally edited in photo editing software for style. Things is, their editing work is... idiosyncratic to say the least. This friend of mine spent a couple of evenings going through the 1000-odd edited jpegs to find the ones they wanted to keep, but they want to go back to the RAW files to edit them themselves. Trouble is, their photographer friend used a different naming scheme for the edited images than what the camera named the raw images. The process of going through the 1000 or so photos to make the shortlist was pretty time consuming and going through all of them again to visually identify the same photos in unedited form seems daunting.

I stumbled upon digikam as a piece of software with the ability to search for similar images and I'm testing it on my machine before recommending it. I copied an image I had in one folder and put it in another folder, both folders were themselves part of a parent folder which is what was imported in to Digikam's database on install. In my second folder, I edited the duplicate image in an implausible way for the sake of making it very different from the original, quite extermely green and underexposed and desaturated etc. I then tried the find similar function and nothing happened. I tried making a more subtle edit, just pushing the tint a bit green but nothing else, and hey presto find similar worked. Then I tried a happy medium between the two, a few more parameters changed but only a little bit and nada, no results. I tried a version scaled disproportionality so it was stretched, but otherwise unchanged and still nothing. I tried a normal, proportional scaling by 50% with no other changes at all so the image was identical, just smaller, still nothing. I tried adjusting the similarity range parameter but for some reason a lot of typed values get rejected and you can only get to them by pressing the up and down arrows once at a time and it only does anything if you wait a second after each click. I was adjusting the minimum level of similarity required hoping this would broaden the range of images picked up in the searrch. Doing this I would expect to see a larger number of results, including all my image variants plus images that were not variants but just completely different images with some particular similarity based on whatever criterion is used. Weirdly it continued to find nothing but my first, subtle green edit again and again, click after click with each percentage point down on the minimum similarity until very suddenly somewhere around the low 40 percentage points it quite suddenly found about 5 "similar" images, none of which were my variants, just other images with presumably some element of similarity to them. I can understand that the term similar is encompassing a few things here so some of the results will indeed be images that are similar say, in colour for example even if not the same image itself, but how is it that literally the same image with a few colour changes or a resolution change isn't in the results when these other images of varying colour, brightness, size, shape arrangement of characteristics within the frame are picked up while the edited variants don't? Even my edited variants with no colour change, just size, apparently not similar even with the lowest threshold of similarity allowed. That's weird. I could go all the way down to 40% before it wouldn't let me go any further and I never saw any of my image variants other than that first successful slightly greenish version result.

In the real world use case where the images will have been cropped and heavily colour corrected I doubt this will work. But is there anything I can do settings wise to help it?


Who is online

Registered users: bartoloni, Bing [Bot], Google [Bot], Sogou [Bot], Yahoo [Bot]