![]() Registered Member ![]()
|
I would like to be able to get all words I have highlighted by different colours from my pdf.
Looking at the xml file I can see the bounding box of the coloured entity but not the "raw text". In addition to having an outer boundig box <base flags="0" creationDate="2015-04-15T08:59:51" uniqueName="okular-{80372623-6989-493b-91dd-0d74a22ef129}" author="RS" modifyDate="2015-04-15T08:59:51" color="#ffff00"> <boundary l="0.307844" r="0.764008" b="0.20392" t="0.14419"/> </base> okular seems to be able to individually detect the words as well, as each word has its own bounding box <hl> <quad dx="0.307971" cx="0.388889" dy="0.17848" bx="0.388889" cy="0.17848" ax="0.307971" by="0.204099" ay="0.204099" feather="1"/> <quad dx="0.402174" cx="0.671498" dy="0.144321" bx="0.671498" cy="0.144321" ax="0.402174" by="0.170794" ay="0.170794" feather="1"/> <quad dx="0.399758" cx="0.764493" dy="0.17848" bx="0.764493" cy="0.17848" ax="0.399758" by="0.204099" ay="0.204099" feather="1"/> </hl> Now I was wondering if there is some simple command which allows me to get the colour tag and the three words (and all others highlighted within the document), or if I would have to go the route via "pdftotext -bbox" and do a bounding box lookup from the resulting pdf to get the content. Many thanks for your answers, robin probably related to: https://bugs.kde.org/show_bug.cgi?id=321992 especially https://bugs.kde.org/show_bug.cgi?id=321992#c9 |
Registered users: Bing [Bot], daret, Google [Bot], Sogou [Bot]