Extract Highlighted Text Depending On Annotation Tool Colour

Wed Apr 15, 2015 10:44 am

I would like to be able to get all words I have highlighted by different colours from my pdf.
Looking at the xml file I can see the bounding box of the coloured entity but not the "raw text".

In addition to having an outer boundig box
<base flags="0" creationDate="2015-04-15T08:59:51" uniqueName="okular-{80372623-6989-493b-91dd-0d74a22ef129}" author="RS" modifyDate="2015-04-15T08:59:51" color="#ffff00">
<boundary l="0.307844" r="0.764008" b="0.20392" t="0.14419"/>
</base>

okular seems to be able to individually detect the words as well, as each word has its own bounding box
<hl>
<quad dx="0.307971" cx="0.388889" dy="0.17848" bx="0.388889" cy="0.17848" ax="0.307971" by="0.204099" ay="0.204099" feather="1"/>
<quad dx="0.402174" cx="0.671498" dy="0.144321" bx="0.671498" cy="0.144321" ax="0.402174" by="0.170794" ay="0.170794" feather="1"/>
<quad dx="0.399758" cx="0.764493" dy="0.17848" bx="0.764493" cy="0.17848" ax="0.399758" by="0.204099" ay="0.204099" feather="1"/>
</hl>

Now I was wondering if there is some simple command which allows me to get the colour tag and the three words (and all others highlighted within the document), or
if I would have to go the route via "pdftotext -bbox" and do a bounding box lookup from the resulting pdf to get the content.

Many thanks for your answers,

robin

probably related to: https://bugs.kde.org/show_bug.cgi?id=321992 especially https://bugs.kde.org/show_bug.cgi?id=321992#c9

Extract Highlighted Text Depending On Annotation Tool Colour

Page 1 of 1 (1 post)

Extract Highlighted Text Depending On Annotation Tool Colour

Bookmarks

Who is online