This forum has been archived. All content is frozen. Please use KDE Discuss instead.

Nepomuk in KDE 4.10.5 doesn't index in doc and djvu files

Tags: None
(comma "," separated)
Aleksey_R
Registered Member
Posts
30
Karma
0
OS
Greetings!
After upgrade from OpenSUSE 12.2 to OpenSUSE 12.3 I found that nepomuk doesn't index doc (MSWord) and djvu files. In KDE 4.8 all indexing was working OK. Does anyone have the same problem in KDE 4.10?

I entered "application/msword" and some other MIME types in nepomuk indexing settings, but that didn't help.

Best regards,
Aleksey.
metzman
Registered Member
Posts
171
Karma
3
OS
Nepomuk is now using it's own indexers and not relying on strigi.
Unfortunately, they're not all written yet.
Take a look at this blog post by Vishesh Handa http://vhanda.in/blog/2013/05/we-need-more-indexers/

Additionally, ou might want to take a look at one of his other posts concerning the change to 4.11 http://vhanda.in/blog/2013/04/the-nepomuk-migration/
Be aware, after this migration the database can no longer be used by pre 4.11, ensure you keep a backup in case you need to revert.
Aleksey_R
Registered Member
Posts
30
Karma
0
OS
Thank you very much for explanation.
Honestly, it's a little bit unexpected thing :-(
Well, let's wait till these indexers will be written.

Best regards,
Aleksey.
metzman
Registered Member
Posts
171
Karma
3
OS
Hi Aleksey,
I thought I'd follow up on this, mainly for my own curiosity ;) -- Like yourself, I use openSUSE 12.3 but with KDE 4.11.1 Nepomuk still does not index DjVu files. I don't use or have any real MS *.doc to try.

However, taking a simple LibreOffice (.odt) file, I saved it as 'Microsoft Word 97/2000/XP/2003' format. (Therefore the following test may not be correct.)
Code: Select all
paul@Orion-1:~/Temporary$ nepomukindexer MS-Word.doc
paul@Orion-1:~/Temporary$ nepomukshow --plainText MS-Word.doc
<nepomuk:/res/c0353762-dbdd-46cf-a8ab-c86d70479f3f>
        rdf:type            nfo:PaginatedTextDocument
        rdf:type            nie:InformationElement
        rdf:type            nfo:FileDataObject
        nao:created         2013-09-24T09:08:19.57Z
        nao:lastModified    2013-09-24T09:08:19.57Z
        nie:lastModified    2013-09-24T09:05:58Z
        nie:url             file:///home/paul/Temporary/MS-Word.doc
        nie:mimeType        application/msword
        nie:created         2013-09-24T09:06:56Z
        nfo:fileSize        15360
        nfo:fileName        MS-Word.doc
        kext:indexingLevel  2
paul@Orion-1:~/Temporary$ nepomukindexer --clear MS-Word.doc

Forcing nepomukindexer to index the file and looking at the result shows that, although it has indexed at level 2, there is no content.

Looking through the various review requests for nepomuk https://git.reviewboard.kde.org/groups/nepomuk/ several indexers are in progress, albeit not for *.doc

It might be worth contacting Vishesh to ask what plans there are to support .doc You can contact him via his blog http://vhanda.in/blog/ alternatively his e-mail address is publicly available on the kde bugtracking site. (This post for example: https://bugs.kde.org/show_bug.cgi?id=293641#c22 )

Aleksey_R wrote:I entered "application/msword" and some other MIME types in nepomuk indexing settings, but that didn't help.

If you meant Settings => Desktop Search => Indexing => Advanced - 'Advanced File Filtering' You need to remove any entries you added, those are used to exclude file(s) from indexing.


As an aside, I've found 4.11 to be a great improvement, queries are executed much faster, the way initial file indexing is done is much better (IMO), and virtuoso-t now seems 'tamed', so far I've not had it using 100% CPU whilst in some infinite loop.


Bookmarks



Who is online

Registered users: Bing [Bot], Evergrowing, Google [Bot], rblackwell