This forum has been archived. All content is frozen. Please use KDE Discuss instead.

how to tell nepomuk to search for whole words

Tags: None
(comma "," separated)
wes33
Registered Member
Posts
103
Karma
1
I find in my searches that the results include
all strings that my search term is a sub-string
of. For example, looking for the name Savan I
get returns of savannah and savant.

How do you tell nepomuk (in dolphin search
for example) to just return complete words?

thanks,
trueg
KDE Developer
Posts
13
Karma
0
That is a good point: you can't. The search Api will always add a wildcard to the query like so: "foo" > "foo*". I thought this was more in sync with what most users expected. Maybe I was wrong and should change that so that the user needs to look for "foo*" to also match "foobar".

Opinions?
wes33
Registered Member
Posts
103
Karma
1
this is kde - it should be a configurable option, but
IMO whole word (no wildcards) should be the default
User avatar
Howl
Registered Member
Posts
55
Karma
0
OS
wes33 wrote:this is kde - it should be a configurable option, but
IMO whole word (no wildcards) should be the default

I don't agree. But I think there should be the possibility to search whole words for example with "".
Kryten2X4B
Registered Member
Posts
911
Karma
4
OS
Howl wrote:I don't agree. But I think there should be the possibility to search whole words for example with "".


I agree with that one, and it seems to be what people expect of search engines (local or otherwise). At least the people I know...whether their behavior is what to be expected or not I do not know, but I have noticed that when they want a precise hit (or more precise at least) they always add quotation-marks around the query.


OpenSUSE 11.4, 64-bit with KDE 4.6.4
Proud to be a member of KDE forums since 2008-Oct.
wes33
Registered Member
Posts
103
Karma
1
the "" option might work but it's more often used
to search for phrases - many search option have a
tick box for "whole words"

I really don't see why anyone (except a developer
looking for substrings) would want to search for
"revolve" and get back "revolver" :)
trueg
KDE Developer
Posts
13
Karma
0
Here is a patch for kdelibs 4.4.1 which removes the default wildcard. Now it needs to be manually specified: http://pastebin.ca/1828330
Please test.
wes33
Registered Member
Posts
103
Karma
1
Here is a patch for kdelibs 4.4.1 which removes the default wildcard. Now it needs to be manually specified: http://pastebin.ca/1828330
Please test.


Thanks - I am eager to try it. But so far as I can tell, that patch is incomplete and when I try to apply it I get this:

Code: Select all
patching file comparisonterm.cpp
patching file literalterm_p.h
patching file literalterm.cpp
patch unexpectedly ends in middle of line
Hunk #2 succeeded at 76 with fuzz 1.
trueg
KDE Developer
Posts
13
Karma
0
should be OK anyway. If it compiles you are fine. the changes in comparisonterm.cpp are the important ones.
wes33
Registered Member
Posts
103
Karma
1
OK - kdelibs builds and nepomuk seems to be working
perfectly. Only whole words are returned unless I
manually add a wildcard.

It also seems to me that the patched version searches
about twice as fast (even with wildcard) and returns
fewer (maybe none) false results.

Unfortunately, the memory leak problem is still BAD (not
that your patch would have had anything to do with it).
After a few test searches, nepomukservices is already up
to 1gb and growing. :(

(P.S. - is there a command line way to stop and restart
nepomuk safely; if that could be done every half hour or
so the memory leak could be contained)
wes33
Registered Member
Posts
103
Karma
1
I'm not sure this patch is safe - it may be coincidence
but since applying it the nepomuk service has been very
unstable.

The first symptom was that it started pruning the index.
Entries were disappearing before my eyes in the dolphin
search results (really!).

I deleted the repository but it has failed to build a new
index (it starts but never finishes).

In addition, nepomuk seems to hang and will not respond in
systemsettings.
trueg
KDE Developer
Posts
13
Karma
0
The patch cannot have had any effect on the rest of the system.

About restarting Nepomuk: there is a DBus interface which you can use:
qdbus org.kde.NepomukServer /nepomukserver org.kde.NepomukServer.enableNepomuk true/false

Which service leaks mem exactly? The nepomukqueryservice one?

About the index: does the strigi service crash? Maybe disable it in the systemsettings (only the file indexer) and then start it manually on the konsole to see its behaviour:

nepomukservicestub nepomukstrigiservice
wes33
Registered Member
Posts
103
Karma
1
thanks for the reply

the memory leak is the same old nepomukqueryservice one

my problem this morning is that after deleting the index
and restarting the service the index is extremely incomplete.

It lists about 2000 files in a directory with 11000.

I notice at this very minute it has gone into the state
where it shrinks the index - in the last few minutes its
gone to reporting 2000 files down to 1500. Why why why
does it start doing this ???

I've used your commands (thanks) so I can see messages
from strigi but there is nothing very illuminating.

Is there a way to force strigi to use FullSpeed - a message
says : Nepomuk::IndexScheduler::setIndexingSpeed: 1 ReducedSpeed

I'd like to see if I can get a proper index built - down to
1100 files indexed now as I type ... oops not down to 1000
and shrinking ... now 900. I wonder what happens when it
gets down to 0?

It's also weird that while the number of indexed files is
shrinking the nepomuk tray icon information window reports
that indexing is idle.

Why is this so hard? :-/
trueg
KDE Developer
Posts
13
Karma
0
There is a patch at https://bugs.kde.org/show_bug.cgi?id=226895 which adds some debugging out put that would be useful. Could you apply that one?
wes33
Registered Member
Posts
103
Karma
1
after patching nepomuk and rebuilding I now get an unending
stream of messages like this as the index shrinks:

Code: Select all
nepomukstrigiservice(7724)/nepomuk (strigi service) Nepomuk::IndexScheduler::removeOldAndUnwantedEntries: Removing  QUrl( "file:///usr/cmydocs/banks_html/philpics/james.jpg" )


Of course, these are files I want to be indexed.

The last time it cycled down to zero files in the index it starting
rebuilding and almost got the entire 11,000 files or so, but then
it started shrinking the index again ... which is where I am now.


Bookmarks



Who is online

Registered users: Bing [Bot], Evergrowing, Google [Bot], rblackwell