Reply to topic

Nepomuk 4.10.2 inconsistent content index results for PDF?

ntruhan
Registered Member
Posts
7
Karma
0
All,
I am running the latest Nepomuk 4.10.2 and am seeing strange return result behavior when searching for items.
When I search for a keyword, such as Transformer, I can see the files that contain this result similar to:
file:///home/ntruhan/Documents/Transformer.txt
file:///home/ntruhan/Documents/transformer config 2.JPG
file:///home/ntruhan/Documents/transformer config.JPG
file:///home/ntruhan/Documents/transformer install 1.JPG
file:///home/ntruhan/Documents/transformer install 2.JPG
file:///home/ntruhan/Documents/transformer install 3.JPG
file:///home/ntruhan/Documents/transformer install 4.JPG
file:///home/ntruhan/Documents/Optimize the Operating Environment for Transformer (old).pdf
file:///home/ntruhan/Documents/Optimize the Operating Environment for Transformer.docx
file:///home/ntruhan/Documents/Optimize the Operating Environment for Transformer.pdf
file:///home/ntruhan/Documents/Transformer in Production.doc
....

BUT when I try to search Content I get a much smaller result set. I didn't think the new Nepomuk indexed DOC/DOCX, etc.. yet, so I wouldn't expect that, but I was under the impression it did PDF's? I say this as I get the following result:
file:///home/ntruhan/Documents/Transformer cmplst.txt
file:///home/ntruhan/Documents/Transformer Install Images.odt
file:///home/ntruhan/Documents/transformer config 2.JPG
file:///home/ntruhan/Documents/transformer config.JPG
file:///home/ntruhan/Documents/transformer install 1.JPG
file:///home/ntruhan/Documents/transformer install 2.JPG
file:///home/ntruhan/Documents/transformer install 3.JPG
file:///home/ntruhan/Documents/transformer install 4.JPG
file:///home/ntruhan/Documents/Optimize the Operating Environment for Transformer (old).pdf
file:///home/ntruhan/Documents/Transformer in Production.doc

The one I am concerned about is the
file:///home/ntruhan/Documents/Optimize the Operating Environment for Transformer.pdf

Which is just a newer version of the (old) one and it has the word Transformer inside it
I have 32,206 files indexed in Nepomuk, and I just deleted my index and re-created it after upgrading and let the index rebuild.

Any idea why it would index the contents of one PDF and not another PDF that I know contains the same keyword?

Thanks you.
User avatar Ignacio Serantes
Registered Member
Posts
448
Karma
1
OS
You could use nepomukshow to inspect stored data:
Code: Select all
nepomukshow "/home/ntruhan/Documents/Optimize the Operating Environment for Transformer.pdf"
and compare it with other pdfs.

If pdf is not properly indexed you could force a reindex with next command:
Code: Select all
nepomukindexer "/home/ntruhan/Documents/Optimize the Operating Environment for Transformer.pdf"


If search still fails then the problem is in the search and not in the stored data.


Ignacio Serantes, proud to be a member of KDE forums since 2008-Nov.
ntruhan
Registered Member
Posts
7
Karma
0
OK. I think I found the problem...
Code: Select all
nepomukindexer "/home/ntruhan/Documents/Optimize the Operating Environment for Transformer.pdf"
nepomukindexer(11193)/kdecore (KSycoca) KSycocaPrivate::openDatabase: Trying to open ksycoca from "/var/tmp/kdecache-ntruhan/ksycoca4"
nepomukindexer(11193)/nepomuk (strigi service): Could not create Extractor:  "nepomukpopplerextractor"
nepomukindexer(11193)/nepomuk (strigi service): "Cannot load library /opt/kde4/lib/kde4/nepomukpopplerextractor.so: (libpoppler.so.35: cannot open shared object file: No such file or directory)"
nepomukindexer(11193)/nepomuk (library) Nepomuk2::ResourceManagerPrivate::_k_storageServiceInitialized: Nepomuk Storage service up and initialized.
nepomukindexer(11193)/nepomuk (library) Nepomuk2::MainModel::Private::init: Connecting to local socket "/tmp/ksocket-ntruhan/nepomuk-socket"
nepomukindexer(11193)/nepomuk (library) Nepomuk2::MainModel::Private::init: Connected :)
nepomukindexer(11193)/nepomuk (strigi service) Nepomuk2::Indexer::indexFile:  QUrl( "nepomuk:/res/0a0c7167-41d6-497b-9d75-39cddb3ec2b2" )  "application/pdf"
nepomukindexer(11193)/nepomuk (strigi service) Nepomuk2::Indexer::fileIndex: Updating indexing level

My libpoppler.so.35 exists in /opt/kde4/lib while the nepomukpopplerextractor.so doesn't seem to find it, although it did find it when compiling it. In cmake I see:
-- Found PopplerQt4: /opt/kde4/include/poppler/qt4

My KDE4 is not installed in the standard location, and while other libraries have been automatically linked and can be seen with ldconfig -p, the latest poppler inside that location was not linked. I am going to add the lib path to my LD_LIBRARY_PATH and see how it works then.

Thank you for the help.
User avatar Ignacio Serantes
Registered Member
Posts
448
Karma
1
OS
Great. Please don't forget to mark your entry as solved.


Ignacio Serantes, proud to be a member of KDE forums since 2008-Nov.
molecule-eye
Registered Member
Posts
277
Karma
0
OS
Nepomuk doesn't search the contents of my pdfs at all, it seems. And for some reason it searches the contents of my text files, even though I have not told it to index those files. I'm running KDE 4.10.2, clean install of Kubuntu 13.04 (not upgraded from previous version with an old version of KDE).
User avatar bcooksley
Administrator
Posts
18586
Karma
83
OS
@molecule-eye: Can you provide the output of the nepomukindexer command applied against a PDF file you have to see if it mentions why the indexing it not succeeding? Your system might be missing the PDF indexer, or be unable to use it.


System Settings and Device Actions KCM maintainer
Image
molecule-eye
Registered Member
Posts
277
Karma
0
OS
bcooksley wrote:@molecule-eye: Can you provide the output of the nepomukindexer command applied against a PDF file you have to see if it mentions why the indexing it not succeeding? Your system might be missing the PDF indexer, or be unable to use it.


How would I do this? nepomukshow isn't a recognized command. I also notice that I have always had audio files unchecked from indexing, and yet when I do a contents search I get mp3 files as results. Is this a Kubuntu/Ubuntu-specific issue?

On a side, my other Kubuntu 13.04 system, upgraded from 12.10, searches contents of pdfs just fine and seems to be working normally. (The same folders and file types are set to be indexed on both machines.)
User avatar Ignacio Serantes
Registered Member
Posts
448
Karma
1
OS
molecule-eye wrote:
bcooksley wrote:@molecule-eye: Can you provide the output of the nepomukindexer command applied against a PDF file you have to see if it mentions why the indexing it not succeeding? Your system might be missing the PDF indexer, or be unable to use it.


How would I do this? nepomukshow isn't a recognized command. I also notice that I have always had audio files unchecked from indexing, and yet when I do a contents search I get mp3 files as results. Is this a Kubuntu/Ubuntu-specific issue?

On a side, my other Kubuntu 13.04 system, upgraded from 12.10, searches contents of pdfs just fine and seems to be working normally. (The same folders and file types are set to be indexed on both machines.)

nepomukindexer program is part of nepomuk-core and mandatory to file indexing so if you don't have this programs installed you must check your installation.
nepomukshow is not required for indexing but it's useful for testing purposes.


Ignacio Serantes, proud to be a member of KDE forums since 2008-Nov.
molecule-eye
Registered Member
Posts
277
Karma
0
OS
Sorry, I have no idea how to obtain nepomukshow. I've installed the dev and debugging tools for nepomuk but it's not in there. Is it available for download somewhere? Google isn't helping.
User avatar google01103
Manager
Posts
4960
Karma
17
OS
might have to dl from git and compile http://techbase.kde.org/Projects/Nepomuk/NepomukShow


OpenSuse 13.1 x64, KDE 4.12.x

Problem solved? Please click on Image below the post with the best answer to mark your topic as solved.
User avatar Ignacio Serantes
Registered Member
Posts
448
Karma
1
OS
google01103 wrote:might have to dl from git and compile http://techbase.kde.org/Projects/Nepomuk/NepomukShow

nepomukshow is now part of nepomuk-core repository.


Ignacio Serantes, proud to be a member of KDE forums since 2008-Nov.
User avatar google01103
Manager
Posts
4960
Karma
17
OS

Mon May 20, 2013 11:27 am
Ignacio Serantes wrote:
google01103 wrote:might have to dl from git and compile http://techbase.kde.org/Projects/Nepomuk/NepomukShow

nepomukshow is now part of nepomuk-core repository.


I can not find it in openSUSE 12.3, searched for it using the package manager

fyi contents of nepomuk-core package
Code: Select all
/usr/bin/nepomuk2-rcgen
/usr/bin/nepomukbackup
/usr/bin/nepomukcleaner
/usr/bin/nepomukindexer
/usr/bin/nepomukserver
/usr/bin/nepomukservicestub
/usr/lib64/kde4/nepomukexiv2extractor.so
/usr/lib64/kde4/nepomukfileindexer.so
/usr/lib64/kde4/nepomukfilewatch.so
/usr/lib64/kde4/nepomukplaintextextractor.so
/usr/lib64/kde4/nepomukpopplerextractor.so
/usr/lib64/kde4/nepomukstorage.so
/usr/lib64/kde4/nepomuktaglibextractor.so
/usr/lib64/libkdeinit4_nepomukserver.so
/usr/lib64/libnepomukcommon.so
/usr/lib64/libnepomukcore.so.4
/usr/lib64/libnepomukcore.so.4.10.3
/usr/lib64/libnepomukextractor.so
/usr/share/applications/kde4/nepomukbackup.desktop
/usr/share/applications/kde4/nepomukcleaner.desktop
/usr/share/autostart/nepomukserver.desktop
/usr/share/dbus-1/interfaces/org.kde.NepomukServer.xml
/usr/share/dbus-1/interfaces/org.kde.nepomuk.BackupManager.xml
/usr/share/dbus-1/interfaces/org.kde.nepomuk.DataManagement.xml
/usr/share/dbus-1/interfaces/org.kde.nepomuk.FileIndexer.xml
/usr/share/dbus-1/interfaces/org.kde.nepomuk.OntologyManager.xml
/usr/share/dbus-1/interfaces/org.kde.nepomuk.Query.xml
/usr/share/dbus-1/interfaces/org.kde.nepomuk.QueryService.xml
/usr/share/dbus-1/interfaces/org.kde.nepomuk.ResourceWatcher.xml
/usr/share/dbus-1/interfaces/org.kde.nepomuk.ResourceWatcherConnection.xml
/usr/share/dbus-1/interfaces/org.kde.nepomuk.ServiceControl.xml
/usr/share/dbus-1/interfaces/org.kde.nepomuk.ServiceManager.xml
/usr/share/dbus-1/interfaces/org.kde.nepomuk.Storage.xml
/usr/share/kde4/apps/fileindexerservice
/usr/share/kde4/apps/fileindexerservice/nepomukfileindexer.notifyrc
/usr/share/kde4/apps/nepomukfilewatch
/usr/share/kde4/apps/nepomukfilewatch/nepomukfilewatch.notifyrc
/usr/share/kde4/apps/nepomukstorage
/usr/share/kde4/apps/nepomukstorage/nepomukstorage.notifyrc
/usr/share/kde4/services/nepomukactivitiesservice.desktop
/usr/share/kde4/services/nepomukbackupsync.desktop
/usr/share/kde4/services/nepomukexiv2extractor.desktop
/usr/share/kde4/services/nepomukfileindexer.desktop
/usr/share/kde4/services/nepomukfilewatch.desktop
/usr/share/kde4/services/nepomukontologyloader.desktop
/usr/share/kde4/services/nepomukplaintextextractor.desktop
/usr/share/kde4/services/nepomukpopplerextractor.desktop
/usr/share/kde4/services/nepomukqueryservice.desktop
/usr/share/kde4/services/nepomukremovablestorageservice.desktop
/usr/share/kde4/services/nepomukstorage.desktop
/usr/share/kde4/services/nepomukstrigiservice.desktop
/usr/share/kde4/services/nepomuktaglibextractor.desktop
/usr/share/kde4/servicetypes/nepomukextractor.desktop
/usr/share/kde4/servicetypes/nepomukservice.desktop
/usr/share/ontology
/usr/share/ontology/kde
/usr/share/ontology/kde/kext.ontology
/usr/share/ontology/kde/kext.trig
/usr/share/ontology/kde/kuvo.ontology
/usr/share/ontology/kde/kuvo.trig
/usr/share/ontology/kde/nrio.ontology
/usr/share/ontology/kde/nrio.trig
/usr/share/pixmaps/nepomuk.png


OpenSuse 13.1 x64, KDE 4.12.x

Problem solved? Please click on Image below the post with the best answer to mark your topic as solved.
molecule-eye
Registered Member
Posts
277
Karma
0
OS
It's not part of nepomuk-core in Kubuntu either. I'll try downloading from git and compiling it myself.
User avatar bcooksley
Administrator
Posts
18586
Karma
83
OS
Please note I mentioned the nepomukindexer command not the nepomukshow command.
Running the nepomukindexer command might give valuable output explaining why the file(s) are not being indexed.


System Settings and Device Actions KCM maintainer
Image
User avatar Ignacio Serantes
Registered Member
Posts
448
Karma
1
OS
Then for some reason packagers are not including this file but it's part of nepomuk-core.

To obtain output using nepomukindexer you must first enable debug output for nepomuk (strigi service), using kdebugdialog, and use --data parameter.


Ignacio Serantes, proud to be a member of KDE forums since 2008-Nov.

 
Reply to topic

Bookmarks



Who is online

Registered users: Alexa [Bot], Artmessiah, Baidu [Spider], barrypicker, Bing [Bot], eagleton, Exabot [Bot], garthecho, ggael, ghevan, Google [Bot], google01103, Hans, jsirek, ken300, koriun, Majestic-12 [Bot], mcaceres, mutlu, nezumi, pinguin74, private_lock, samuelig, scummos, Tepee, TheraHedwig, tparrott, woodburner60, Yahoo [Bot], zabastodwa