This forum has been archived. All content is frozen. Please use KDE Discuss instead.

Baloo: loops when indexing the last few files

Tags: baloo baloo baloo
(comma "," separated)
User avatar
Moviuro
Registered Member
Posts
20
Karma
0
OS
Hi,

Baloo never stops running on my laptop when on AC (I could let it run a whole night over my mere 10GB that it won't stop):
Code: Select all
moviuro@psychoticdelirium ~ % balooctl status
Baloo File Indexer is running
Indexed 5693 / 5698 files
Failed to index 1 files
File IDs: 18804
moviuro@psychoticdelirium ~ % balooctl status
Baloo File Indexer is running
Indexed 5694 / 5698 files
Failed to index 1 files
File IDs: 18804
moviuro@psychoticdelirium ~ % balooctl status
balooctl(26412): Could not obtain lock for Xapian Database. This is bad
Baloo File Indexer is running
Indexed 5694 / 5698 files
Failed to index 1 files
File IDs: 18804
moviuro@psychoticdelirium ~ % balooctl status
Baloo File Indexer is running
Indexed 5691 / 5698 files
Failed to index 1 files
File IDs: 18804
moviuro@psychoticdelirium ~ % balooctl status
Baloo File Indexer is running
Indexed 5692 / 5698 files
Failed to index 1 files
File IDs: 18804

I would gladly give more intel, but since balooctl is not documented (see here).

Cheers!

EDIT: I dealt with file ID "18804" (a .ttf file). Still the same issue:
Code: Select all
moviuro@psychoticdelirium ~ % balooctl status
Baloo File Indexer is running
Indexed 5695 / 5697 files
Failed to index 0 files
moviuro@psychoticdelirium ~ % balooctl status
Baloo File Indexer is running
Indexed 5695 / 5697 files
Failed to index 0 files
moviuro@psychoticdelirium ~ % balooctl status
Baloo File Indexer is running
Indexed 5693 / 5697 files
Failed to index 0 files
moviuro@psychoticdelirium ~ % balooctl status
balooctl(1798): Could not obtain lock for Xapian Database. This is bad
Baloo File Indexer is running
Indexed 5692 / 5697 files
Failed to index 0 files
moviuro@psychoticdelirium ~ % balooctl status
Baloo File Indexer is running
Indexed 5690 / 5697 files
Failed to index 0 files


KDE fan since 2008,
Using ArchLinux (almost) only to get the latest KDE ASAP!
User avatar
bcooksley
Administrator
Posts
19765
Karma
87
OS
Could you please observe Baloo in operation using top, htop or System Activity/KSysguard and see the commands that Baloo is invoking to index the files? I suspect the indexer is crashing.

You could also try tailing ~/.xsession-errors to see if anything useful is output there.


KDE Sysadmin
[img]content/bcooksley_sig.png[/img]
User avatar
Moviuro
Registered Member
Posts
20
Karma
0
OS
bcooksley wrote:You could also try tailing ~/.xsession-errors to see if anything useful is output there.

After dealing with an "unindexable" file, running
Code: Select all
balooctl stop

Unplug the laptop, put it to sleep, wake it up, run
Code: Select all
balooctl start

Plug the laptop, it's all back in order:
Code: Select all
% balooctl status
Baloo File Indexer is running
Indexed 5697 / 5697 files
Failed to index 0 files

So yeah, back in order, though I will not be able to reproduce.

(Shouldn't it cry somewhere visible when it crashes/fails instead of having me look into it?)


KDE fan since 2008,
Using ArchLinux (almost) only to get the latest KDE ASAP!
User avatar
Moviuro
Registered Member
Posts
20
Karma
0
OS
Back to first situation, except this time it doesn't fail to index any file. Check the screenshot for a I/O graph. When I/O first drops and lies to 0, it's right after I stopped baloo.
http://wstaw.org/m/2014/07/16/plasma-desktopB17149.png
Also, it uses tremendous amount of CPU for plain nothing since it will just loop on the last files for ever.

Also, it is quite fortunate actually that it only happens when on AC. Else, KDE would indeed be the energy-hungry monster many people (wrongly?) believe it is.


KDE fan since 2008,
Using ArchLinux (almost) only to get the latest KDE ASAP!
User avatar
bcooksley
Administrator
Posts
19765
Karma
87
OS
Can you use "lsof" to determine which files it is attempting to continuously index?


KDE Sysadmin
[img]content/bcooksley_sig.png[/img]
User avatar
Moviuro
Registered Member
Posts
20
Karma
0
OS
bcooksley wrote:Can you use "lsof" to determine which files it is attempting to continuously index?

Here is the output of
Code: Select all
% while true; do lsof|grep baloo_fi >> /tmp/baloo_fi; date -u >> /tmp/baloo_fi; done

http://ix.io/dt4

If you can make sense of it, please do. Also I can't understand why there is absolutely no usable tool to baloo. I mean, why should I try to mine some trash data and hope to find something useful?
Don't get me wrong: indexing is promising and is a sweet feature. However, finding the "current file being indexed" should not require me to use a third party tool, cross-validating my data and checking some other configs too. This must be integrated within balooctl. Else, it really just is some on/off switch.


KDE fan since 2008,
Using ArchLinux (almost) only to get the latest KDE ASAP!
User avatar
vHanda
KDE Developer
Posts
84
Karma
0
OS
Hi. This may be an interesting read - https://community.kde.org/Baloo/Debugging

Please provide the following information -
* Is the baloo_file_extractor process running? If it is please check the file numbers it is running on and run balooshow on them.
* What do you mean by there is no usable tool for baloo? Use Dolphin or KRunner or Milou or command line tools such as baloosearch. What exactly are you missing?

In general, yes, I agree we could improve our tooling to detecting which files are being indexed. I've been thinking about adding a debug mode which users can enable which till inform them about the average time and IO usage per file type. If you're on 5.0, `balooctl fileStatistics` does reveal information about which kind of files are consuming the most space in the index.
User avatar
Moviuro
Registered Member
Posts
20
Karma
0
OS
vHanda wrote:* Is the baloo_file_extractor process running? If it is please check the file numbers [in ksysguard] it is running on and run balooshow on them.
Finally something that makes sense and is easy!
It is running on my IRC logs (pretty awful because they change a lot). I am unsure about what to do now: should I disable indexing the logs...?
vHanda wrote:* What do you mean by there is no usable tool for baloo? Use Dolphin or KRunner or Milou or command line tools such as baloosearch. What exactly are you missing?
I meant a configuration utility: the one in systemsettings is only an "opt-out" selection tool, balooctl which is an on/off switch and balooshow which has only one use.


KDE fan since 2008,
Using ArchLinux (almost) only to get the latest KDE ASAP!
User avatar
vHanda
KDE Developer
Posts
84
Karma
0
OS
Moviuro wrote:
vHanda wrote:* Is the baloo_file_extractor process running? If it is please check the file numbers [in ksysguard] it is running on and run balooshow on them.
Finally something that makes sense and is easy!
It is running on my IRC logs (pretty awful because they change a lot). I am unsure about what to do now: should I disable indexing the logs...?


What are the filenames of the IRC logs? With Baloo we have had problems with very high IO when indexing files with larges number of words. With 13.1, we explicitly disabled indexing all files with the mimetype 'text/plain' unless the name ended with a '.txt'. This is a temporary solution until we fix the high IO usage.

vHanda wrote:* What do you mean by there is no usable tool for baloo? Use Dolphin or KRunner or Milou or command line tools such as baloosearch. What exactly are you missing?
I meant a configuration utility: the one in systemsettings is only an "opt-out" selection tool, balooctl which is an on/off switch and balooshow which has only one use.


As a fun exercise, could you please tell me what all tools you would want out of Baloo? Feel free to let your imagination run wild.
User avatar
Moviuro
Registered Member
Posts
20
Karma
0
OS
vHanda wrote:What are the filenames of the IRC logs?
Code: Select all
<konversation server's name>_<channel>.log
E.g.:
freenode_znc_#btrfs.log

1 file has > 1M words and 7 have > 500k.

vHanda wrote:
Moviuro wrote:
vHanda wrote:* What do you mean by there is no usable tool for baloo? Use Dolphin or KRunner or Milou or command line tools such as baloosearch. What exactly are you missing?
I meant a configuration utility: the one in systemsettings is only an "opt-out" selection tool, balooctl which is an on/off switch and balooshow which has only one use.
As a fun exercise, could you please tell me what all tools you would want out of Baloo? Feel free to let your imagination run wild.
There should be one only GUI tool to:
  • Globally turn On/Off baloo/indexing;
  • Pause/resume switch;
  • Choose either opt-in or opt-out strategy;
  • Select MIME types/folders following the previously chosen strategy;
  • Current baloo status: current file, file number, file name, file statistics;
  • A rough searchbox to make sure indexing is working and producing results.


KDE fan since 2008,
Using ArchLinux (almost) only to get the latest KDE ASAP!
User avatar
vHanda
KDE Developer
Posts
84
Karma
0
OS
Moviuro wrote:
vHanda wrote:What are the filenames of the IRC logs?
Code: Select all
<konversation server's name>_<channel>.log
E.g.:
freenode_znc_#btrfs.log

1 file has > 1M words and 7 have > 500k.


Hmm. The file should not be getting indexed if it ends with '.log'. What is its mimetype?
User avatar
Moviuro
Registered Member
Posts
20
Karma
0
OS
vHanda wrote:
Moviuro wrote:
vHanda wrote:What are the filenames of the IRC logs?
Code: Select all
<konversation server's name>_<channel>.log
E.g.:
freenode_znc_#btrfs.log

1 file has > 1M words and 7 have > 500k.
Hmm. The file should not be getting indexed if it ends with '.log'. What is its mimetype?

Code: Select all
% file logs/freenode_znc_\#btrfs.log
logs/freenode_znc_#btrfs.log: UTF-8 Unicode text, with very long lines


KDE fan since 2008,
Using ArchLinux (almost) only to get the latest KDE ASAP!
User avatar
vHanda
KDE Developer
Posts
84
Karma
0
OS
Moviuro wrote:
Code: Select all
% file logs/freenode_znc_\#btrfs.log
logs/freenode_znc_#btrfs.log: UTF-8 Unicode text, with very long lines


$ file --mime-type <fileUrl> ?
User avatar
Moviuro
Registered Member
Posts
20
Karma
0
OS
vHanda wrote:
Moviuro wrote:
Code: Select all
% file logs/freenode_znc_\#btrfs.log
logs/freenode_znc_#btrfs.log: UTF-8 Unicode text, with very long lines
Code: Select all
$ file --mime-type <fileUrl> ?
Code: Select all
logs/freenode_znc_#btrfs.log: text/plain


KDE fan since 2008,
Using ArchLinux (almost) only to get the latest KDE ASAP!
User avatar
vHanda
KDE Developer
Posts
84
Karma
0
OS
Moviuro wrote:
Code: Select all
logs/freenode_znc_#btrfs.log: text/plain


This makes no sense. If it is "text/plain" it should not be indexed unless it ends with a .txt. What version of KDE are you on?


Bookmarks



Who is online

Registered users: Bing [Bot], Google [Bot], YaCy [Bot], Yahoo [Bot]