This forum has been archived. All content is frozen. Please use KDE Discuss instead.

Strigi Indexing of pdf and other content not working

Tags: None
(comma "," separated)
molecule-eye
Registered Member
Posts
402
Karma
0
OS
I've had strigi disabled because I use Recoll but I thought I would give strigi a whirl. I set one subfolder of my home folder I wanted indexed containing just pdf and djvu files but for some reason strigi indexed the contents of my home folder though none of the pdf or djvu content (not sure if the latter is even possbile).

I noticed neither strigiclient nor strigidaemon were installed so I installed them along with strigi-utils (the package name under Kubuntu 10.10). I edited the folders to be indexed according to strigiclient but the changes never stuck so I edited daemon.conf manually which worked. But when I clicked "start indexing", even making sure the daemon was running, it wouldn't do anything to the pdf/djvu folder. In fact it says -1 file is indexed!

From what I understand it should be possible to index pdf contents using strigi. I'm running Kubuntu 10.10 with KDE 4.6 RC1. Any way to get strigi to do what it should? And why has it indexed the wrong folders anyway?
User avatar
sylvainsjc
Registered Member
Posts
22
Karma
0
OS
Hi,
If you're talking about text contained in pdf files, for me it works :

- on Mandriva 2010.0 or 2010.1 with Kde 4.3 or 4.4
- on LinuxMint 9 Kde with Kde 4.4.5
- on Kubuntu 10.10 with Kde 4.5.4
- on Mandriva Cooker with Kde 4.6 RC1

If you're talking about metadata contained in Pdf files, Strigi team is currently working on it and the feature should be available shortly
molecule-eye
Registered Member
Posts
402
Karma
0
OS
I meant the embedded searchable text layer, not the metadata.

After restarting and having installed the strigidaemon and strigiclient, dolphin is now searching within pdfs and I'm not getting a warning at login about strigi not starting due to a missing strigidaemon. But it looks like strigi and nepomuk are keeping separate databases. Strigiclient shows indexing in progress (even though it says stopped the indexed files and database size climb and "top" shows the daemon hogging the cpu) even though nepomuk says strigi is idle. There's two large database files, one in the nepomuk folder and one in the strigi folder.

Also, there's no way for me to stop the daemon, either from strigiclient or nepomuk settings. There used to be a tray icon for this and I see a bug report has been filed regarding its missing in KDE 4.6.

Strigi still seems like it's in quite a mess in 4.6, at least in Kubuntu.
User avatar
pumrel
Registered Member
Posts
8
Karma
0
OS
I second this. It's really mysterious how strigi and nepomuk cooperate.
Nepomuk says it has a database of 2 GB. Although I have Strigi enabled in settings I have recently found out that strigidaemon wasn't running.
I started it manually and ran strigiclient which showed the database of strigi was zero??? What the?
I started the indexing and the count kept increasing.
First off, why are there two seperate databases and why wasn't strigidaemon started automatically?
User avatar
bcooksley
Administrator
Posts
19765
Karma
87
OS
strigidaemon and strigiclient are something completely different. Nepomuk simply uses the data extraction component of Strigi, then stores the extracted data itself.


KDE Sysadmin
[img]content/bcooksley_sig.png[/img]
User avatar
einar
Administrator
Posts
3402
Karma
7
OS
In fact in 4.7 and beyond, the indexing of single files is the realm of a new program, "nepomukindexer".


"Violence is the last refuge of the incompetent."
Image
Plasma FAQ maintainer - Plasma programming with Python


Bookmarks



Who is online

Registered users: bancha, Bing [Bot], Google [Bot], Sogou [Bot]