Strigi Indexing of pdf and other content not working

Board index

Page 1 of 1 (6 posts)

Tags:

molecule-eye Registered Member Posts 402 Karma 0 OS	Strigi Indexing of pdf and other content not working Wed Jan 05, 2011 1:58 am I've had strigi disabled because I use Recoll but I thought I would give strigi a whirl. I set one subfolder of my home folder I wanted indexed containing just pdf and djvu files but for some reason strigi indexed the contents of my home folder though none of the pdf or djvu content (not sure if the latter is even possbile). I noticed neither strigiclient nor strigidaemon were installed so I installed them along with strigi-utils (the package name under Kubuntu 10.10). I edited the folders to be indexed according to strigiclient but the changes never stuck so I edited daemon.conf manually which worked. But when I clicked "start indexing", even making sure the daemon was running, it wouldn't do anything to the pdf/djvu folder. In fact it says -1 file is indexed! From what I understand it should be possible to index pdf contents using strigi. I'm running Kubuntu 10.10 with KDE 4.6 RC1. Any way to get strigi to do what it should? And why has it indexed the wrong folders anyway?
sylvainsjc Registered Member Posts 22 Karma 0 OS	Re: Strigi Indexing of pdf and other content not working Wed Jan 05, 2011 7:10 am Hi, If you're talking about text contained in pdf files, for me it works : - on Mandriva 2010.0 or 2010.1 with Kde 4.3 or 4.4 - on LinuxMint 9 Kde with Kde 4.4.5 - on Kubuntu 10.10 with Kde 4.5.4 - on Mandriva Cooker with Kde 4.6 RC1 If you're talking about metadata contained in Pdf files, Strigi team is currently working on it and the feature should be available shortly
molecule-eye Registered Member Posts 402 Karma 0 OS	Re: Strigi Indexing of pdf and other content not working Wed Jan 05, 2011 10:45 am I meant the embedded searchable text layer, not the metadata. After restarting and having installed the strigidaemon and strigiclient, dolphin is now searching within pdfs and I'm not getting a warning at login about strigi not starting due to a missing strigidaemon. But it looks like strigi and nepomuk are keeping separate databases. Strigiclient shows indexing in progress (even though it says stopped the indexed files and database size climb and "top" shows the daemon hogging the cpu) even though nepomuk says strigi is idle. There's two large database files, one in the nepomuk folder and one in the strigi folder. Also, there's no way for me to stop the daemon, either from strigiclient or nepomuk settings. There used to be a tray icon for this and I see a bug report has been filed regarding its missing in KDE 4.6. Strigi still seems like it's in quite a mess in 4.6, at least in Kubuntu.
pumrel Registered Member Posts 8 Karma 0 OS	Re: Strigi Indexing of pdf and other content not working Sun Sep 04, 2011 10:32 am I second this. It's really mysterious how strigi and nepomuk cooperate. Nepomuk says it has a database of 2 GB. Although I have Strigi enabled in settings I have recently found out that strigidaemon wasn't running. I started it manually and ran strigiclient which showed the database of strigi was zero??? What the? I started the indexing and the count kept increasing. First off, why are there two seperate databases and why wasn't strigidaemon started automatically?
bcooksley Administrator Posts 19765 Karma 87 OS	Re: Strigi Indexing of pdf and other content not working Tue Sep 06, 2011 9:12 am strigidaemon and strigiclient are something completely different. Nepomuk simply uses the data extraction component of Strigi, then stores the extracted data itself. KDE Sysadmin [img]content/bcooksley_sig.png[/img]
einar Administrator Posts 3402 Karma 7 OS	Re: Strigi Indexing of pdf and other content not working Sun Sep 11, 2011 11:33 am In fact in 4.7 and beyond, the indexing of single files is the realm of a new program, "nepomukindexer". "Violence is the last refuge of the incompetent." Plasma FAQ maintainer - Plasma programming with Python