This forum has been archived. All content is frozen. Please use KDE Discuss instead.
Please use bugs.kde.org for bug reports or feature requests. Development related questions should be directed to the okular-devel mailing list.

Extract Highlighted Text Depending On Annotation Tool Colour

Tags: None
(comma "," separated)
robins
Registered Member
Posts
3
Karma
0
I would like to be able to get all words I have highlighted by different colours from my pdf.
Looking at the xml file I can see the bounding box of the coloured entity but not the "raw text".

In addition to having an outer boundig box
<base flags="0" creationDate="2015-04-15T08:59:51" uniqueName="okular-{80372623-6989-493b-91dd-0d74a22ef129}" author="RS" modifyDate="2015-04-15T08:59:51" color="#ffff00">
<boundary l="0.307844" r="0.764008" b="0.20392" t="0.14419"/>
</base>

okular seems to be able to individually detect the words as well, as each word has its own bounding box
<hl>
<quad dx="0.307971" cx="0.388889" dy="0.17848" bx="0.388889" cy="0.17848" ax="0.307971" by="0.204099" ay="0.204099" feather="1"/>
<quad dx="0.402174" cx="0.671498" dy="0.144321" bx="0.671498" cy="0.144321" ax="0.402174" by="0.170794" ay="0.170794" feather="1"/>
<quad dx="0.399758" cx="0.764493" dy="0.17848" bx="0.764493" cy="0.17848" ax="0.399758" by="0.204099" ay="0.204099" feather="1"/>
</hl>

Now I was wondering if there is some simple command which allows me to get the colour tag and the three words (and all others highlighted within the document), or
if I would have to go the route via "pdftotext -bbox" and do a bounding box lookup from the resulting pdf to get the content.

Many thanks for your answers,

robin

probably related to: https://bugs.kde.org/show_bug.cgi?id=321992 especially https://bugs.kde.org/show_bug.cgi?id=321992#c9


Bookmarks



Who is online

Registered users: Bing [Bot], daret, Google [Bot], Sogou [Bot]