word lists - strigi? nepomuk? • KDE Community Forums

This forum has been archived. All content is frozen. Please use KDE Discuss instead.

Board index

word lists - strigi? nepomuk?

Page 1 of 1 (3 posts)

Tags:

happy_heyoka Registered Member Posts 7 Karma 0 OS	word lists - strigi? nepomuk? Tue Jul 17, 2012 2:51 pm I have an idea for an application to automatically categorise and tag documents based on their contents. To do this I need a frequency distribution of the words in the document. I have played around with the nepomuk examples and have a few clues about the tagging and rdf storage. I can't find much info on a per-document word list though - nepsak, nepoogle don't appear to show it, so maybe it's not stored in virtuoso? Is there a word list stored (eg: inverted vector index)? How does the full text search in Dolphin do its thing? Do I need to produce this list myself using libstreamanalyzer? I'd prefer not to do a second indexing pass.
bcooksley Administrator Posts 19765 Karma 87 OS	Re: word lists - strigi? nepomuk? Wed Jul 18, 2012 9:24 am Given the type of question this is, you may want to ask the Nepomuk developers themselves directly as to how you might accomplish this. Please send a email to nepomuk@kde.org. KDE Sysadmin [img]content/bcooksley_sig.png[/img]
happy_heyoka Registered Member Posts 7 Karma 0 OS	Re: word lists - strigi? nepomuk? Tue Aug 07, 2012 1:36 pm bcooksley wrote:Given the type of question this is, you may want to ask the Nepomuk developers themselves directly as to how you might accomplish this. Please send a email to nepomuk@kde.org. Firstly, for completeness, that email is for a mailing list and you can subscribe here: https://mail.kde.org/mailman/listinfo/nepomuk note that this list is pretty heavy on traffic about maintaining and shipping the Nepomuk infrastructure I got several responses (thanks) and basically the word lists are internal to Virtuoso (the database that holds and does the semantic content queries for Nepomuk). The closest thing to what I am after is that the Nepomuk property nie:plainTextContent contains the text extracted from a file; I'm going to have to post-process that to get what I need. Jörg Ehrichs also posted a URL to some source to an app that does some queries against Nepomuk for files: http://blog.6bytesmore.com/2011/12/resource-browser.html

Page 1 of 1 (3 posts)

Bookmarks

Who is online

Registered users: Bing [Bot], Evergrowing, Google [Bot], rockscient