This forum has been archived. All content is frozen. Please use KDE Discuss instead.

Search a word in a PDF

Tags: None
(comma "," separated)
Hari Seldon
Registered Member
Posts
6
Karma
0
OS

Search a word in a PDF

Sun Aug 02, 2009 7:17 pm
Hi guys,
I have a very simple question.
I am looking for a way to search a word in a set of PDFs without open them. I find out that the tool "Find Files/Folders"
of Dolphin (Kfind) in the tab "Contents" seems to allow to search for a text in PDFs, but according to this:
http://docs.kde.org/development/en/kdeb ... range.html
this does not work ...

Now I am trying to write a simple application to perform this task. I think that this functionality would be very useful, at least for me.
My first idea was to use the "pdftotext" command from the "xpdf-utils" debian package and than use grep.
But I think that this is very inefficient...

Does anyone know a smarter way to do this?

I was also wondering if there is some sort of explanation or documentation of how Okular searches for words in a PDF.
I checked out the Okular source code trying to get some ideas, but this kind of reverse engineering is far beyond my possibilities.

Thank you in Advance
User avatar
Alec
Registered Member
Posts
565
Karma
1
OS

Re: Search a word in a PDF

Sun Aug 02, 2009 7:38 pm
Okular uses the Poppler library for PDFs.

I haven't used it myself, but you may find something in the documentation.


Get problems solved faster - get reply notifications through Jabber!
Hari Seldon
Registered Member
Posts
6
Karma
0
OS

Re: Search a word in a PDF

Sun Aug 02, 2009 7:53 pm
Thank you Alec,
I' ll try to use Poppler.
User avatar
Alec
Registered Member
Posts
565
Karma
1
OS

Re: Search a word in a PDF

Sun Aug 02, 2009 8:35 pm
I actually decided to make it myself for the fun of it... :<

Here you go:

Code: Select all
#include <poppler/qt4/poppler-qt4.h>

int main(int argc, char *argv[])
{
    if(argc <= 2)
    {
        printf("Usage: %s [search] [file]\n", argv[0]);
        return -1;
    }

    QString search(argv[1]);
    QString filename(argv[2]);

    Poppler::Document* document = Poppler::Document::load(filename);
    if (!document || document->isLocked())
    {
        delete document;
        return -1;
    }

    int pageCount = document->numPages();
    bool found = false;

    for (int ix = 0; ix < pageCount; ix++)
    {
        Poppler::Page* pdfPage = document->page(ix);
        QRectF area(QPoint(0, 0), pdfPage->pageSize());
        if(pdfPage->search(search, area,
                           Poppler::Page::FromTop,
                           Poppler::Page::CaseInsensitive,
                           Poppler::Page::Rotate0))
        {
            if(!found)
            {
                printf("Found \"%s\" on the following pages:\n",
                        search.toLatin1().data());
                found = true;
            }
            printf("%d\n", ix + 1);
        }

        delete pdfPage;
    }

    if(!found)
        printf("No occurences found :(\n");

    return 0;
}


Get problems solved faster - get reply notifications through Jabber!
Hari Seldon
Registered Member
Posts
6
Karma
0
OS

Re: Search a word in a PDF

Mon Aug 03, 2009 5:10 pm
Works fine.
Nice job Alec !
But you stole my fun of it ;-)
evod
Registered Member
Posts
4
Karma
0
OS

Re: Search a word in a PDF

Wed Aug 12, 2009 11:13 am
Can somebody give me a hint on how to compile this sweet little program?

I installed libpoppler-qt4-dev, put the code into main.cpp, called the following:

$ qmake -project
$ qmake
$ make

but it's not that easy, "undefined reference to `Poppler::Document::load.."
Hari Seldon
Registered Member
Posts
6
Karma
0
OS

Re: Search a word in a PDF

Wed Aug 12, 2009 5:14 pm
Hi evod,
add this line to your *.pro file:
LIBS += -lpoppler-qt4
I had a similar issue and this fixed the problem. It is simply a missing library.

Hope this helps.
Bye
evod
Registered Member
Posts
4
Karma
0
OS

Re: Search a word in a PDF

Fri Aug 14, 2009 8:38 am
Thank you Alec and Hari!

I adapted the code to search a complete directory for pdfs. That's quite helpful for writing a dimploma thesis with all that literature lying around :)

Code: Select all
#include <QtCore>
#include <poppler/qt4/poppler-qt4.h>

int main(int argc, char *argv[])
{
    if(argc <= 2 || argc > 3)
    {
        printf("Usage: %s [search] [file or directory]\n", argv[0]);
        return -1;
    }

    QString search(argv[1]);
    QString filename(argv[2]);
    QStringList files;

    QFileInfo fileinfo(filename);
    if(fileinfo.isDir())
    {
        files.append(QDir(filename).entryList(QStringList("*.pdf"), QDir::Files));
    }
    else if(filename.endsWith(".pdf"))
    {
        files.append(filename);
    }
    else
        printf("\"%s\" doesn't seem to be a pdf file?\n", argv[2]);

    printf("Searching for \"%s\" in \"%s\"...\n", argv[1], argv[2]);

    Poppler::Document* document;
    int pageCount;
    bool found=false, foundInDocument;
    foreach (filename, files)
    {
        document = Poppler::Document::load(filename);
        if (!document || document->isLocked())
        {
            delete document;
            continue;
        }

        pageCount = document->numPages();
        foundInDocument = false;

        for (int ix = 0; ix < pageCount; ix++)
        {
            Poppler::Page* pdfPage = document->page(ix);
            QRectF area(QPoint(0, 0), pdfPage->pageSize());
            if(pdfPage->search(search, area,
                               Poppler::Page::FromTop,
                               Poppler::Page::CaseInsensitive,
                               Poppler::Page::Rotate0))
            {
                if(!foundInDocument)
                {
                    printf("found in \"%s\" on pages: ", filename.toLatin1().data());
                    foundInDocument = true;
                    found = true;
                }
                printf("%d, ", ix + 1);
            }

            delete pdfPage;

        }
        if(foundInDocument)
            printf("\n");
    }

    if(!found)
        printf("No occurences found :(\n");

    return 0;
}
User avatar
Alec
Registered Member
Posts
565
Karma
1
OS

Re: Search a word in a PDF

Fri Aug 14, 2009 4:40 pm
Thanks! :)


Get problems solved faster - get reply notifications through Jabber!


Bookmarks



Who is online

Registered users: Bing [Bot], Google [Bot], Sogou [Bot]