This forum has been archived. All content is frozen. Please use KDE Discuss instead.

Nepomuk keeps indexing the same files

Tags: None
(comma "," separated)
Minio
Registered Member
Posts
177
Karma
1
OS

Nepomuk keeps indexing the same files

Thu Sep 05, 2013 12:46 pm
Hi

Recently I updated KDE SC from 4.8.4 to 4.10.5 (this is what Debian provides ATM). I knew that Nepomuk indexer backend has changed in this release, so I decided to go with fresh copy of database (that is: I removed ~/.kde/share/apps/nepomuk/repository/ when user was logged out of KDE session).

Initial scan went all great and finished in no time.
Then, when machine was idle, Nepomuk decided to index content of files. And this is where problems begins.

From what I can tell, Nepomuk picked up few random files and indexes them all over again.
There is a command I used to figure out what files Nepomuk is interested in and how many times it tried to index each one:
Code: Select all
grep '/usr/bin/nepomukindexer' ~/.xsession-errors |sed -e 's:.*"\(.*\)":\1:' |sort | uniq| while read line; do echo -n "$line: "; grep -c "$line" ~/.xsession-errors; done
/home/minio: 17478
/home/minio/ksiazki: 12766
/home/minio/ksiazki/: 11173
/home/minio/ksiazki/Aleister Crowley - The Sword of Song.pdf: 1593
/home/minio/ksiazki/Encrypting a Linux_Windows dual-boot system — Sysnet Documentation 0.0.1.mht: 1592
/home/minio/ksiazki/How Politics wreck your math skills.pdf: 3
/home/minio/ksiazki/instrukcja aparatu Fujifilm Finepix Z35-pl_01.pdf: 1593
/home/minio/ksiazki/Keith Curtis - SoftwareWars - why open source is better.pdf: 1596
/home/minio/ksiazki/licencjat-od-Jacka.pdf: 1593
/home/minio/ksiazki/pralka-Candy-CTD1365-instrukcja-obslugi.pdf: 1593
/home/minio/ksiazki/Wimmer - Napisz prace dyplomowa.pdf: 1592
/home/minio/nepo.sh: 5
/home/minio/Przepis na LO/zrobione/tabele-przestawne.html: 3
/home/minio/teksty/CV/Adecco/oferta.mht: 3
/home/minio/teksty/CV/Adecco/Zalewski - CV-en.odt: 3
/home/minio/teksty/CV/Adecco/Zalewski - CV-en.pdf: 3
/home/minio/teksty/CV/Adecco/Zalewski - CV.odt: 3
/home/minio/teksty/CV/Adecco/Zalewski - CV.pdf: 7
/home/minio/teksty/CV/Adecco/Zalewski - CV-pl.pdf: 3
/home/minio/teksty/CV/ASTEK/oferta.mht: 3
/home/minio/teksty/CV/ASTEK/Zalewski - CV.odt: 3
/home/minio/teksty/CV/ASTEK/Zalewski - CV.pdf: 3
/home/minio/teksty/CV/ASTEK/Zalewski - list motywacyjny.odt: 6
/home/minio/teksty/CV/ASTEK/Zalewski - list motywacyjny.pdf: 14
/home/minio/teksty/CV/Grafton - analityk 9-26-24300/oferta.mht: 3
/home/minio/teksty/CV/Grafton - analityk 9-26-24300/Zalewski - CV.odt: 7
/home/minio/teksty/CV/Grafton - analityk 9-26-24300/Zalewski - list motywacyjny.odt: 7
/home/minio/teksty/CV/PMT marketing system/oferta.mht: 3
/home/minio/teksty/CV/PMT marketing system/Zalewski - CV.odt: 3
/home/minio/teksty/CV/PMT marketing system/Zalewski - CV.pdf: 7
/home/minio/website: 1593

As you can see, there are eight files in /home/minio/ksiazki/ dir that has been indexed almost 1600 times.
This directory contains 97 files, 86 of which are PDF.

What makes things worse, content of these files never really reaches database. After pointing Dolphin at this directory, I open Search bar, make it look in file content and input string that certainly do exist in one of files picked up by Nepomuk. I got zero results.

I have followed instructions at KDE Userbase and turned debug mode on. Then I did:
Code: Select all
/usr/bin/nepomukindexer /home/minio/ksiazki/licencjat-od-Jacka.pdf
nepomukindexer(17608)/kdecore (KSycoca) KSycocaPrivate::openDatabase: Trying to open ksycoca from "/var/tmp/kdecache-minio/ksycoca4"
nepomukindexer(17608)/nepomuk (library) Nepomuk2::ResourceManagerPrivate::_k_storageServiceInitialized: Nepomuk Storage service up and initialized.
nepomukindexer(17608)/nepomuk (library) Nepomuk2::MainModel::Private::init: Connecting to local socket "/tmp/ksocket-minio/nepomuk-socket"
nepomukindexer(17608)/nepomuk (library) Nepomuk2::MainModel::Private::init: Connected :)
nepomukindexer(17608)/nepomuk (strigi service) Nepomuk2::Indexer::indexFile:  QUrl( "nepomuk:/res/806b695c-bf92-4816-9e46-3c81d377d7e5" )  "application/pdf"
nepomukindexer(17608)/nepomuk (strigi service) Nepomuk2::Indexer::fileIndex: Saving plain text content
nepomukindexer(17608)/nepomuk (strigi service) Nepomuk2::Indexer::fileIndex: Updating indexing level

But there is no ~/.kde/share/data/nepomuk/file-indexer-error.log file that I could look into.

The only similar thing I have found on web is this Launchpad bug report. Discussion there did not reach any conclusion.

Related config files:
Code: Select all
$ cat ~/.kde/share/config/nepomukbackuprc
[Backup]
backup day=6
backup frequency=weekly
backup time=10:10:00
max backups=3


Code: Select all
cat ~/.kde/share/config/nepomukserverrc
[$Version]
update_info=nepomukstrigiservice-migrate.upd:nepomukstrigiservice-migrate

[Basic Settings]
Start Nepomuk=true

[Service-nepomukfileindexer]
autostart=true

[main Settings]
Maximum memory=130
Storage Dir[$e]=$HOME/.kde/share/apps/nepomuk/repository/main/
Used Soprano Backend=virtuosobackend


Code: Select all
cat ~/.kde/share/config/nepomukstrigirc
[Device-filex://98b85d82b85d5fb4]
exclude folders[$e]=/
folders[$e]=
mount path=/mnt/zewnetrzny

[General]
exclude filters=dvd-file,geniso-path-list,*.iso,*.orig,conftest,.xsession-errors*,*.tmp,.svn,.histfile.*,.git,litmain.sh,CMakeFiles,*.pc,*.la,CMakeCache.txt,*.rej,config.status,lost+found,confstat,_darcs,CVS,*.part,*.po,*~,*.moc,*.vm*,core-dumps
exclude filters version=2
exclude folders[$e]=$HOME/sources,$HOME/obrazy/smplayer_screenshots,$HOME/obrazy/icons,$HOME/teksty/socjologia/muzyka cyfrowa w Polsce - 2012,$HOME/ksiazki/LibreOffice/OOo Forum - backup,$HOME/teksty/socjologia/magisterka/posty,$HOME/public_html,$HOME/obrazy/sygs,$HOME/Przepis na LO/site-content,$HOME/Przepis na LO/bazy-danych,$HOME/teksty/socjologia/magisterka/analiza-dyskursu,$HOME/skrypty,$HOME/website,$HOME/filmy,$HOME/torrenty,$HOME/obrazy/Webcam
exclude mimetypes=audio/*,video/*,text/css,text/x-c++src,text/x-c++hdr,text/x-csrc,text/x-chdr,text/x-python,text/x-assembly,text/x-java,text/x-objsrc,text/x-ruby,text/x-scheme,text/x-pascal,text/x-yacc,text/x-sed,text/x-haskell,text/asp,application/x-awk,application/x-cgi,application/x-csh,application/x-java,application/x-javascript,application/x-perl,application/x-php,application/x-python,application/x-sh,application/x-tex
first run=true
folders[$e]=$HOME
index hidden folders=false
index newly mounted=false
debug mode=true

[RemovableMedia]
ask user=false
index newly mounted=false

[general]
legacyCleaning=false


The question is: what do I do now?
If this is known issue, how can I work around it?
If it is not, should I report bug at KDE bugzilla? How do I get meaningful information for developers?


Best regards
Mirosław Zalewski
User avatar
bcooksley
Administrator
Posts
19765
Karma
87
OS
When you manually index the files, does this succeed, (ie. exit with a return code of 0) or does the indexer silently crash?
The behaviour you are reporting certainly sounds like the Nepomuk indexer is crashing while trying to index these files.


KDE Sysadmin
[img]content/bcooksley_sig.png[/img]
Minio
Registered Member
Posts
177
Karma
1
OS
How do I check that?

What I did:
Code: Select all
$ qdbus org.kde.nepomuk.services.nepomukfileindexer /nepomukfileindexer org.kde.nepomuk.FileIndexer.indexFile '/home/minio/ksiazki/licencjat-od-Jacka.pdf'

$ /usr/bin/nepomukindexer /home/minio/ksiazki/licencjat-od-Jacka.pdf
nepomukindexer(22127)/kdecore (KSycoca) KSycocaPrivate::openDatabase: Trying to open ksycoca from "/var/tmp/kdecache-minio/ksycoca4"
nepomukindexer(22127)/nepomuk (library) Nepomuk2::ResourceManagerPrivate::_k_storageServiceInitialized: Nepomuk Storage service up and initialized.
nepomukindexer(22127)/nepomuk (library) Nepomuk2::MainModel::Private::init: Connecting to local socket "/tmp/ksocket-minio/nepomuk-socket"
nepomukindexer(22127)/nepomuk (library) Nepomuk2::MainModel::Private::init: Connected :)
nepomukindexer(22127)/nepomuk (strigi service) Nepomuk2::Indexer::indexFile:  QUrl( "nepomuk:/res/806b695c-bf92-4816-9e46-3c81d377d7e5" )  "application/pdf"
nepomukindexer(22127)/nepomuk (strigi service) Nepomuk2::Indexer::fileIndex: Saving plain text content
nepomukindexer(22127)/nepomuk (strigi service) Nepomuk2::Indexer::fileIndex: Updating indexing level
$ echo $?
0
$

So, qdbus returned no message, /usr/bin/nepomukindexer finished with status 0, which would indicate that everything is working fine.

I still can't find files by content.
Command:
Code: Select all
qdbus org.kde.nepomuk.services.nepomukfileindexer /nepomukfileindexer org.kde.nepomuk.FileIndexer.indexedFiles

returns 0 and - if I understand it's purpose - it should tell me how many files Nepomuk has successfully indexed (so, by dividing it by totalFiles, I can check what percent of files Nepomuk has indexed).

Pointing /usr/bin/nepomukindexer at other files (I have tried different PDF, JPG and TXT) does not result in files ending up in Nepomuk index.


Best regards
Mirosław Zalewski
User avatar
bcooksley
Administrator
Posts
19765
Karma
87
OS
You checked things fine there - this is rather unusual, as it should index the file names at the very least.
Can you try to reproduce this under a new user, to eliminate any user account specific issues?


KDE Sysadmin
[img]content/bcooksley_sig.png[/img]
Minio
Registered Member
Posts
177
Karma
1
OS
I have created new system user and copied some files (JPG, TXT, PDF) to his home directory.

After leaving computer idle for about 30 minutes:
1. indexedFiles shows 0.
2. Analysis of ~/.xsession-errors shows that Nepomuk still tries to index some files again and again (of course this is different set).
3. In Dolphin, searching for files that contain word occurring in one of picked TXT files show results.

I have then tried to run nepomukindexer on this particular file in my usual user homedir. Searching for file content in Dolphin do not show results.

So, it looks that Nepomuk can index content of text file, but for whatever reason it does not work on my usual setup.
Other than that, it still can't index content of PDF and keeps indexing same files.
Probably I should ask Debian guys, but any comments from KDE community are appreciated.


Best regards
Mirosław Zalewski
User avatar
bcooksley
Administrator
Posts
19765
Karma
87
OS
Under your usual user, can you try starting Nepomuk by hand to see if it mentions anything of relevance as to why it may not be able to perform properly?

You can run the following command to shutdown the currently running instance of Nepomuk:
Code: Select all
qdbus org.kde.NepomukServer /nepomukserver quit

It may take a few moments (30 or so seconds) for Nepomuk to complete shutting down, even after this command completes.

This command can be used to start Nepomuk up:
Code: Select all
nepomukserver &


KDE Sysadmin
[img]content/bcooksley_sig.png[/img]
Minio
Registered Member
Posts
177
Karma
1
OS
I did run command as you advised. The only potentially disturbing messages I have found are:
Code: Select all
[/usr/bin/nepomukservicestub] nepomukstorage(6112)/nepomuk (storage service) Nepomuk2::VirtuosoInferenceModel::updateOntologyGraphs: Need to update ontology graph group

Code: Select all
[/usr/bin/nepomukservicestub] nepomukfilewatch(6177)/nepomuk (filewatch service) KInotify::Private::addWatch: Failed to create watch for "/mnt/zewnetrzny/gabriela/Projekty/pentor1/Nowy folder/Gabriela1/archiwum_internetowe/pracaPL/www.praca.pl/img/firma/templates/654961"
[/usr/bin/nepomukservicestub] nepomukfilewatch(6177)/nepomuk (filewatch service) KInotify::Private::addWatch: User limit reached. Please raise the inotify user watch limit.

The second message is especially confusing, as /mnt/zewnetrzny (root for directory in question) is my external hard drive, and I have asked Nepomuk to ignore external devices in it's config pane.

Other than that, usual stuff - services starting, Nepomuk picking up files, identifying their mimetype, saying that export for plain text went successful.
Full log may be found here: http://pastebin.com/RCAu6bFW (file is 180K in size, so I did not paste it on forums).


Best regards
Mirosław Zalewski
User avatar
bcooksley
Administrator
Posts
19765
Karma
87
OS
Can you check to see if there are any symlinks in an indexed location pointing to the external media?


KDE Sysadmin
[img]content/bcooksley_sig.png[/img]
Minio
Registered Member
Posts
177
Karma
1
OS
No, there are no symlinks in directory in question that point to external device.

To be strict, there are only two symlinks that point at external HDD in my entire $HOME:
One is right in the root (~/, that is). It is not mentioned in "Custom directories" dialog in Nepomuk configuration module at all, so I believe that Nepomuk just ignores it.
Second is in ~/.wine/, and - as far as I know - Nepomuk does not index hidden directories by default.


Best regards
Mirosław Zalewski
Minio
Registered Member
Posts
177
Karma
1
OS
KDE 4.11 was uploaded to Debian recently and I have upgraded yesterday. The problem does not exist anymore - all my files has been indexed and I can find them by content in Dolphin.

So, whatever was the cause, it is now gone.


Best regards
Mirosław Zalewski


Bookmarks



Who is online

Registered users: Bing [Bot], Evergrowing, Google [Bot], rblackwell