This forum has been archived. All content is frozen. Please use KDE Discuss instead.

Is PDF Indexing Working?!

Tags: None
(comma "," separated)
CyberAngel
Registered Member
Posts
49
Karma
0

Re: Is PDF Indexing Working?!

Fri Feb 24, 2012 11:45 am
Ignacio Serantes wrote:
CyberAngel wrote:KDE Bug opened here
https://bugs.kde.org/show_bug.cgi?id=294727
In general, when you use nepomukindexer with a file and that file is not located by nepoogle index process was failing.

As I wrote xmlindexer an rdfindexer are a good tools to try to detect problems.

I do a try with the pdf you attach to the bug report and is true that nepomukindexer has an error parsing the file but the file is indexed and is visible to nepoogle in my system.

On the other side, using pdftk-gui to fix the pdf file and after fix the pdf there is no error indexing the pdf.

In openSUSE this is the packages and versions I have installed:
  • kdegraphics-strigi-analyzer - 4.8.0-24.1
  • kdesdk4-strigi - 4.8.0-225.1
  • libstrigi0 - 0.7.6-65.3
  • libstrigi0-32bit - 0.7.6-65.3
  • strigi - 0.7.6-65.3
  • strigi-devel - 0.7.6-65.3


How do you recognize errors in xmlindexer and rdfindexer?
I don't see any obvious one.

In the file I uploaded in the bug page, I don't see any text when I run xmlindexer or rdfindexer (so that's an indication). In mln-manual.pdf, everything looks fine though but still impossible to index it.

Something else...
I don't have many strigi packages installed (like strigi-daemon) in my system but I guess I don't need them as well...
Even the package strigi-utils providing xmlindexer and rdfindexer, I had to manually install it.

Here is a list of packages and their corresponding versions in my system returned after a search for strigi.

Code: Select all
apt-cache policy $(apt-cache search strigi | awk '{print $1}')
libstreamanalyzer0:
  Installed: 0.7.6-2ubuntu1
  Candidate: 0.7.6-2ubuntu1
  Version table:
 *** 0.7.6-2ubuntu1 0
        500 http://ftp.uninett.no/ubuntu/ oneiric/main amd64 Packages
        100 /var/lib/dpkg/status
libstreams0:
  Installed: 0.7.6-2ubuntu1
  Candidate: 0.7.6-2ubuntu1
  Version table:
 *** 0.7.6-2ubuntu1 0
        500 http://ftp.uninett.no/ubuntu/ oneiric/main amd64 Packages
        100 /var/lib/dpkg/status
shared-desktop-ontologies:
  Installed: 0.8.1-1~ppa1~oneiric1
  Candidate: 0.8.1-1~ppa1~oneiric1
  Version table:
 *** 0.8.1-1~ppa1~oneiric1 0
        500 http://ppa.launchpad.net/kubuntu-ppa/backports/ubuntu/ oneiric/main amd64 Packages
        100 /var/lib/dpkg/status
     0.7.0-0ubuntu1 0
        500 http://ftp.uninett.no/ubuntu/ oneiric/main amd64 Packages
kdegraphics-strigi-analyzer:
  Installed: 4:4.8.0-0ubuntu1~oneiric1~ppa1
  Candidate: 4:4.8.0-0ubuntu1~oneiric1~ppa1
  Version table:
 *** 4:4.8.0-0ubuntu1~oneiric1~ppa1 0
        500 http://ppa.launchpad.net/kubuntu-ppa/backports/ubuntu/ oneiric/main amd64 Packages
        100 /var/lib/dpkg/status
     4:4.7.3-0ubuntu0.1 0
        500 http://ftp.uninett.no/ubuntu/ oneiric-updates/main amd64 Packages
     4:4.7.1-0ubuntu2 0
        500 http://ftp.uninett.no/ubuntu/ oneiric/main amd64 Packages
kdegraphics-mobipocket:
  Installed: (none)
  Candidate: 4:4.8.0-0ubuntu1~oneiric1~ppa1
  Version table:
     4:4.8.0-0ubuntu1~oneiric1~ppa1 0
        500 http://ppa.launchpad.net/kubuntu-ppa/backports/ubuntu/ oneiric/main amd64 Packages
     4:4.7.3-0ubuntu0.1 0
        500 http://ftp.uninett.no/ubuntu/ oneiric-updates/main amd64 Packages
     4:4.7.1-0ubuntu2 0
        500 http://ftp.uninett.no/ubuntu/ oneiric/main amd64 Packages
kdegraphics-strigi-plugins:
  Installed: (none)
  Candidate: 4:4.8.0-0ubuntu1~oneiric1~ppa1
  Version table:
     4:4.8.0-0ubuntu1~oneiric1~ppa1 0
        500 http://ppa.launchpad.net/kubuntu-ppa/backports/ubuntu/ oneiric/main amd64 Packages
     4:4.7.3-0ubuntu0.1 0
        500 http://ftp.uninett.no/ubuntu/ oneiric-updates/universe amd64 Packages
     4:4.7.1-0ubuntu2 0
        500 http://ftp.uninett.no/ubuntu/ oneiric/universe amd64 Packages
kdepim-strigi-plugins:
  Installed: 4:4.8.0a-0ubuntu1~oneiric1~ppa1
  Candidate: 4:4.8.0a-0ubuntu1~oneiric1~ppa1
  Version table:
 *** 4:4.8.0a-0ubuntu1~oneiric1~ppa1 0
        500 http://ppa.launchpad.net/kubuntu-ppa/backports/ubuntu/ oneiric/main amd64 Packages
        100 /var/lib/dpkg/status
     4:4.7.4+git111222-0ubuntu0.1 0
        500 http://ftp.uninett.no/ubuntu/ oneiric-updates/main amd64 Packages
     4:4.7.4+git111222-0ubuntu0.1~ppa1 0
        500 http://ppa.launchpad.net/kubuntu-ppa/ppa/ubuntu/ oneiric/main amd64 Packages
     4:4.7.2+git111007-0ubuntu1 0
        500 http://ftp.uninett.no/ubuntu/ oneiric/main amd64 Packages
libstreams-dev:
  Installed: (none)
  Candidate: 0.7.6-2ubuntu1
  Version table:
     0.7.6-2ubuntu1 0
        500 http://ftp.uninett.no/ubuntu/ oneiric/main amd64 Packages
libstreamanalyzer-dev:
  Installed: (none)
  Candidate: 0.7.6-2ubuntu1
  Version table:
     0.7.6-2ubuntu1 0
        500 http://ftp.uninett.no/ubuntu/ oneiric/main amd64 Packages
libsearchclient-dev:
  Installed: (none)
  Candidate: 0.7.6-2ubuntu1
  Version table:
     0.7.6-2ubuntu1 0
        500 http://ftp.uninett.no/ubuntu/ oneiric/main amd64 Packages
libstrigihtmlgui-dev:
  Installed: (none)
  Candidate: 0.7.6-2ubuntu1
  Version table:
     0.7.6-2ubuntu1 0
        500 http://ftp.uninett.no/ubuntu/ oneiric/main amd64 Packages
libstrigiqtdbusclient-dev:
  Installed: (none)
  Candidate: 0.7.6-2ubuntu1
  Version table:
     0.7.6-2ubuntu1 0
        500 http://ftp.uninett.no/ubuntu/ oneiric/main amd64 Packages
kdesdk-strigi-plugins:
  Installed: (none)
  Candidate: 4:4.8.0-0ubuntu1~oneiric1~ppa1
  Version table:
     4:4.8.0-0ubuntu1~oneiric1~ppa1 0
        500 http://ppa.launchpad.net/kubuntu-ppa/backports/ubuntu/ oneiric/main amd64 Packages
     4:4.7.4-0ubuntu0.1 0
        500 http://ftp.uninett.no/ubuntu/ oneiric-updates/main amd64 Packages
     4:4.7.4-0ubuntu0.1~ppa1 0
        500 http://ppa.launchpad.net/kubuntu-ppa/ppa/ubuntu/ oneiric/main amd64 Packages
     4:4.7.1-0ubuntu3 0
        500 http://ftp.uninett.no/ubuntu/ oneiric/main amd64 Packages
libsearchclient0:
  Installed: (none)
  Candidate: 0.7.6-2ubuntu1
  Version table:
     0.7.6-2ubuntu1 0
        500 http://ftp.uninett.no/ubuntu/ oneiric/main amd64 Packages
strigi-daemon:
  Installed: (none)
  Candidate: 0.7.6-2ubuntu1
  Version table:
     0.7.6-2ubuntu1 0
        500 http://ftp.uninett.no/ubuntu/ oneiric/universe amd64 Packages
libstrigihtmlgui0:
  Installed: (none)
  Candidate: 0.7.6-2ubuntu1
  Version table:
     0.7.6-2ubuntu1 0
        500 http://ftp.uninett.no/ubuntu/ oneiric/main amd64 Packages
libstrigiqtdbusclient0:
  Installed: (none)
  Candidate: 0.7.6-2ubuntu1
  Version table:
     0.7.6-2ubuntu1 0
        500 http://ftp.uninett.no/ubuntu/ oneiric/main amd64 Packages
strigi-dbg:
  Installed: (none)
  Candidate: 0.7.6-2ubuntu1
  Version table:
     0.7.6-2ubuntu1 0
        500 http://ftp.uninett.no/ubuntu/ oneiric/main amd64 Packages
catfish:
  Installed: (none)
  Candidate: 0.3.2-1ubuntu1
  Version table:
     0.3.2-1ubuntu1 0
        500 http://ftp.uninett.no/ubuntu/ oneiric/universe amd64 Packages
strigi-client:
  Installed: (none)
  Candidate: 0.7.6-2ubuntu1
  Version table:
     0.7.6-2ubuntu1 0
        500 http://ftp.uninett.no/ubuntu/ oneiric/universe amd64 Packages
strigi-utils:
  Installed: 0.7.6-2ubuntu1
  Candidate: 0.7.6-2ubuntu1
  Version table:
 *** 0.7.6-2ubuntu1 0
        500 http://ftp.uninett.no/ubuntu/ oneiric/universe amd64 Packages
        100 /var/lib/dpkg/status
User avatar
Ignacio Serantes
Registered Member
Posts
453
Karma
1
OS

Re: Is PDF Indexing Working?!

Fri Feb 24, 2012 4:18 pm
CyberAngel wrote:How do you recognize errors in xmlindexer and rdfindexer?
I don't see any obvious one.

When there is output there is no obvious method but you can compare results between an indexed pdf and a not indexed a pdf and try to detect what is happening.

To complicate more this stuff indexer could be working but adding the resource to Nepomuk could fail but, at least, you can confirm that there is not a problem at indexer level.

CyberAngel wrote:In the file I uploaded in the bug page, I don't see any text when I run xmlindexer or rdfindexer (so that's an indication). In mln-manual.pdf, everything looks fine though but still impossible to index it.

Of course, if there is no text then indexer is failing. In my system, and with my debug level, I obtained and "Error in parsing" msg indexing your pdf.

As I explained, when I repaired the pdf indexer using pdftk there is no error parsing the file so seems like this pdf has some type of damage. As both fixed and not fixed pdfs are indexed in my system I can't confirm if a repaired pdf will be indexed in your system.

Finally about your installed versions I can't help you because I don't use Ubunto, maybe others could help you. I can only confirm you that in openSUSE 11.3 the pdf is indexed.


Ignacio Serantes, proud to be a member of KDE forums since 2008-Nov.


Bookmarks



Who is online

Registered users: bancha, Bing [Bot], Google [Bot], Sogou [Bot]