Reply to topic

Transitioning to KDE's next generation semantic search

User avatar einar
Administrator
Posts
3399
Karma
7
OS
For the upcoming 4.13 release, the search framework used by KDE applications has been extensively reworked, leading to a much better performance and a lower memory footprint.

However, the change is deep and partially backwards incompatible, but the developers have provided tools for the transition. This topic provides information on how to deal with the change and what to expect with the new framework.

Migrating existing data to the new framework

1. Files

A special utility has been provided to convert your existing data (file and tags) to the new storage format. To use it, ensure that Nepomuk is enabled and running. To do so, check that $KDEHOME/share/config/nepomukserverrc contains "Start Nepomuk=true" then issue

Code: Select all
nepomukctl status

to ensure that the legacy Nepomuk server is running. If not, use

Code: Select all
nepomukctl start

to start it.

Afterwards, open a terminal and run

Code: Select all
nepomukbaloomigrator

This will copy over all the information on files and tags to the new system, and will also take care of disabling the old running programs for you.

2. Emails

The storage format for the emails is incompatible with the previous version, and as such they will be automatically re-indexed. It should not be an issue as the indexing is several orders of magnitude faster than before.

Questions and answers

Q. How do I pause indexing?
A. Indexing is paused when on battery (for laptops). Otherwise it always runs if there are new files to index.

Q. I don't want the indexer to touch folder $FOLDER. What should I do?
A. Go to System Settings -> Desktop Search -> And add that folder to the list of the folders which should not be indexed.

Q. How do I disable indexing?
A. Indexing is always enabled if files should appear in search results. You may exclude folders from search results from System Settings -> Desktop Search.

Q. How do I search with the new backend?
A. Everything should work as before. You can search using Dolphin's search feature, through KRunner, look for recent files with timeline:// and tags with tags://.

Q. I have disabled indexing. Can I still perform searches on what is already indexed?
A. Yes. Search works regardless if the indexing is running or not (assuming that some data has been indexed).

Known issues

  • Linking files to activities is currently broken, because it has not yet been ported to the new system.
  • Some applications still depend on the legacy functionality (Amarok, Digikam), and some distributions (for example openSUSE) have already disabled it. However, bear in mind that disabling this legacy functionality does not impair normal usage in any way.


As search is part of the recent 4.13 beta, please post messages with issues and discussion on the new search framework to the 4.13 beta forum.


"Violence is the last refuge of the incompetent."
Image
Plasma FAQ maintainer - Plasma programming with Python
User avatar aseigo
KDE Developer
Posts
124
Karma
2
einar wrote:Q. How do I pause indexing?
A. Indexing is paused when on battery (for laptops). Otherwise it always runs if there are new files to index.


There is no *UI* to do it, but there is a dbus command:

qdbus org.kde.baloo.file /indexer suspend


aseigo, proud to be a member of KDE forums since 2008-Oct.
private_lock
Registered Member
Posts
11
Karma
0
Hello!

Is there already some documentation on the new syntax of search-terms available? The F1-online-help still explains the old syntax where *foo* == foo

I noticed, I cannot get dolphin to find any files, that I don't know the beginning of a word in the filename. So e.g. to find a specific photo, I have to type all the filename from the start: P0000123 instead of the significant digits as in 123 or *123 or *123* (all of those won't work).

Is there some escape character for the space? Or can I somehow mark a string as one token by surrounding it in 'quotes' or ""? So far, every space breaks the search term in single word tokens which are combined by or ... so typing more words will ADD more results to the list instead of narrowing the list it down, which I consider counter-intuitive. But the special word "AND" all uppercase with leading and trailing whitespace will give the intersection of two searches.

So far: "s week" won't find "this week" but "second week" which is confusing.

Kind regards
private_lock

PS: KDE 4.13.0 on Kubuntu 14.04beta2
User avatar vHanda
KDE Developer
Posts
84
Karma
0
OS
private_lock wrote:Hello!

Is there already some documentation on the new syntax of search-terms available? The F1-online-help still explains the old syntax where *foo* == foo


Unfortunately, no. There is currently no documentation, but that's probably because so few things are supported.

private_lock wrote:I noticed, I cannot get dolphin to find any files, that I don't know the beginning of a word in the filename. So e.g. to find a specific photo, I have to type all the filename from the start: P0000123 instead of the significant digits as in 123 or *123 or *123* (all of those won't work).


Yup. You're quite right. In the current release you can only search for files when you know how it starts. There is a bug report regarding this, I'll try to get to it for 4.13.1. It's not trivial.

private_lock wrote:Is there some escape character for the space? Or can I somehow mark a string as one token by surrounding it in 'quotes' or ""? So far, every space breaks the search term in single word tokens which are combined by or ... so typing more words will ADD more results to the list instead of narrowing the list it down, which I consider counter-intuitive. But the special word "AND" all uppercase with leading and trailing whitespace will give the intersection of two searches.

So far: "s week" won't find "this week" but "second week" which is confusing.


By default all terms are ANDed. I don't think I've exposed some way to make it OR. We try to expand each word based on the starting characters. Not based on ending or middle, that would just be too expensive. I can add explicit wildcard support if required though, but it won't be as fast.

Also, feel free to try out the `baloosearch` tool to try out the queries.
private_lock
Registered Member
Posts
11
Karma
0
Dear vHanda ... well the thing is ... just look at my screenshots.

Image

I searched for "b " (Bee-Space) and "b c " (Bee-Space-Cee-Space) and got a clearly OR'ed result on my home-directory. Omitting the trailing space will add thousands of other files that contain a word beginning with the letter in question.

Really, I appreciate the work you spend on performance. Just thinking about running a search over my whole home-directory was forbidden in the previous search incarnation. It drove me to even investigate the cryptic commandline-parameters of find ... yes, very powerful but I can't even get the simplest search right the first time. It took me days do work out what -prune does. And still I prefer to castrate the find result by piping it through grep.

There is another strange observation: If I enable the tree-structured details view and get a folder as result. So far so good. But I can only unfold it, if none ob the files inside were matched themselves by the search-term. Otherwise those files are shown next to that folder and the folder itself appears to be empty. I'd like the folder to always unfold irrespective of further (partial) search-results inside. (Folder Bad_religion contains 5 additional songs with space instead of underscore)

Image

Keep up the good work
private_lock
seal20
Registered Member
Posts
5
Karma
0
Sorry I should have perhaps open a new thread, feel free to move this post if require.

In regards to file association with activities, I understand that the functionality will be broken if I upgrade to baloo or I need to run both baloo and nepomuk at the same time which may slow down the system (cf wiki page and my pc is not so new...). The association of files with activities is an important part of my work flow and I don't want to lose it. What are my options? Do not migrate to baloo yet seems the most direct but I cannot find an issue to check when I will be able to upgrade, a pointer would be nice. If upgrade now the association will be lost during the migration or will reappear once the functionality is restored/

Thanks in advance and looking forward a better semantic desktop.
User avatar vHanda
KDE Developer
Posts
84
Karma
0
OS
private_lock wrote:Dear vHanda ... well the thing is ... just look at my screenshots.

Image

I searched for "b " (Bee-Space) and "b c " (Bee-Space-Cee-Space) and got a clearly OR'ed result on my home-directory. Omitting the trailing space will add thousands of other files that contain a word beginning with the letter in question.

Really, I appreciate the work you spend on performance. Just thinking about running a search over my whole home-directory was forbidden in the previous search incarnation. It drove me to even investigate the cryptic commandline-parameters of find ... yes, very powerful but I can't even get the simplest search right the first time. It took me days do work out what -prune does. And still I prefer to castrate the find result by piping it through grep.

There is another strange observation: If I enable the tree-structured details view and get a folder as result. So far so good. But I can only unfold it, if none ob the files inside were matched themselves by the search-term. Otherwise those files are shown next to that folder and the folder itself appears to be empty. I'd like the folder to always unfold irrespective of further (partial) search-results inside. (Folder Bad_religion contains 5 additional songs with space instead of underscore)

Image

Keep up the good work
private_lock


Whoa. You're quite right. This seems incorrect. Would you mind trying to follow the steps mentioned over here - http://community.kde.org/Baloo/Debuggin ... ch_results

I'm not sure if this is an issue with Dolphin's Baloo integration or Baloo.
User avatar vHanda
KDE Developer
Posts
84
Karma
0
OS
seal20 wrote:Sorry I should have perhaps open a new thread, feel free to move this post if require.

In regards to file association with activities, I understand that the functionality will be broken if I upgrade to baloo or I need to run both baloo and nepomuk at the same time which may slow down the system (cf wiki page and my pc is not so new...). The association of files with activities is an important part of my work flow and I don't want to lose it. What are my options? Do not migrate to baloo yet seems the most direct but I cannot find an issue to check when I will be able to upgrade, a pointer would be nice. If upgrade now the association will be lost during the migration or will reappear once the functionality is restored/

Thanks in advance and looking forward a better semantic desktop.


Yes. Activity linking for files will be broken. You can run both Baloo and Nepomuk at the same time, however you may need to manually edit your nepomukserverrc file to enable Nepomuk after the transition. Baloo automatically migrates the data and then switches off Nepomuk.

Also, if you're only running Nepomuk for activities, you may want to enable light mode - http://vhanda.in/blog/2012/08/nepomuk-lite-mode/

I'm not sure about when activities information will be ported. It depends on the maintainer of activities.
seal20
Registered Member
Posts
5
Karma
0
Thanks for the fast answer and the advice.
I though of another workaround that will allow me to completely disable nepomuk. I just have to Tag the files and folders related to the activity with a specific tag. Then in the folder view widget settings I specify a folder with a path like this : tags:/Activity and it should give me the same results as with nepomuk.

In fact this gives me an idea for a feature request: to associate activities with tags (or tags with activities) so that all files tagged with the specified tags will be automatically associated with an activity.. ??? . Need to think of a simpler way to express this..... :-\

Thanks again. I will now upgrade.
private_lock
Registered Member
Posts
11
Karma
0
vHanda wrote:
private_lock wrote:Image

I searched for "b " (Bee-Space) and "b c " (Bee-Space-Cee-Space) and got a clearly OR'ed result on my home-directory. Omitting the trailing space will add thousands of other files that contain a word beginning with the letter in question.

private_lock


Whoa. You're quite right. This seems incorrect. Would you mind trying to follow the steps mentioned over here - http://community.kde.org/Baloo/Debuggin ... ch_results

I'm not sure if this is an issue with Dolphin's Baloo integration or Baloo.


I don't try to waste your time on dumb questions ... erm ... how do I pass my searchterm to the baloosearch commandline tool?

Any of theses lines:
baloosearch 'b '
baloosearch "b "
baloosearch b\
will output files/folders like:
/home/holger/Musik/o/Oldie´s/Blues Brothers
which clearly don't feature a single letter B anywhere. So effectively I cannot reproduce my initial result on the commandline (but still in Dolphin)

baloosearch b c
This seems to correctly AND-connect the two search terms, as this will result in:
/home/holger/Musik/c/Cher/Cher-Believe.mp3
/home/holger/Bilder/Webshots-Convert/Vancouver Skyline, British Columbia, Canada.jpg
/home/holger/Bilder/Webshots-Convert/Lone Cypress, Pebble Beach, California.jpg
and the like.

So my guess would be, that Dolphin issues two independent search requests (e.g. in parallel for speedup) and shows the combined list incorrectly as union of the individual results instead of an intersection.

Any comment on the other issue of folder contents?

private_lock wrote:There is another strange observation: If I enable the tree-structured details view and get a folder as result. So far so good. But I can only unfold it, if none ob the files inside were matched themselves by the search-term. Otherwise those files are shown next to that folder and the folder itself appears to be empty. I'd like the folder to always unfold irrespective of further (partial) search-results inside. (Folder Bad_religion contains 5 additional songs with space instead of underscore)

Image

Keep up the good work
private_lock
thorstent
Registered Member
Posts
2
Karma
0
Hi everyone,

I really like the new feature and it works well for my emails.

Q. I don't want the indexer to touch folder $FOLDER. What should I do?
A. Go to System Settings -> Desktop Search -> And add that folder to the list of the folders which should not be indexed.


What is not clear is how to add a folder to the indexing. My home directory consists of symlinks and they point to mounted drives. They are however not included in the indexing.

Thanks, Thorsten
User avatar einar
Administrator
Posts
3399
Karma
7
OS
There is an advanced, third-party control panel being developed that will probably do what you want. It's available in thie wild but has not yet a formal release though.


"Violence is the last refuge of the incompetent."
Image
Plasma FAQ maintainer - Plasma programming with Python
User avatar bcooksley
Administrator
Posts
19759
Karma
86
OS
The advanced control panel sources can be found at https://gitorious.org/baloo-kcmadv if you're interested in self-compiling.


KDE Sysadmin
[img]http://forum.kde.org/content/bcooksley_sig.png[/img]
thorstent
Registered Member
Posts
2
Karma
0
bcooksley wrote:The advanced control panel sources can be found at https://gitorious.org/baloo-kcmadv if you're interested in self-compiling.


Thanks I installed that and now the indexing is progressing. :)
User avatar jackyalcine
Registered Member
Posts
6
Karma
0
OS
Would it be possible for a plasmoid to report the status of indexing? And also provide a bit of control?
I'm not sure if that's a TODO or in the pipeline.

 
Reply to topic

Bookmarks



Who is online

Registered users: Baidu [Spider], Bing [Bot], boudewijn, Capitain_Jack, chrisogilvie, Exabot [Bot], flyfan, Google [Bot], JesusM, johnguicar, johssw, jsola, lynnux, maymax, rblackwell, Sogou [Bot], waynes, Yahoo [Bot]