Registered Member
|
First, thanks to the makers of Okular. The "select table" feature is something I use all the time - it is, frankly, great.
This suggestion is, at it's most basic, an extension of the "select table" tool. Some sort of frontend plugin maybe. *disclaimer - I have no idea where to even start coding this myself* PDF editing or converting is something that seems to come up a lot. You need to take the PDF, and manipulate it into some editable format. Getting hold of the source format that created the PDF isn't an option and even if you could you don't always have access to InDesign/QuarkXPress/Powerpoint/whatever. LibreOffice has PDF import, but this creates a "layout accurate" version in Draw, but seems to actually lose information about tables and so on, by just placing all text in frames at the correct location on the page. What I find to be most useful is to retain the content rather than the layout. Once the content has been edited (or whatever) you can lay it out however you like. You can get software such as Adobe Acrobat, Nitro Pro or ABBY Finereader, but they each have their own problems (cost, not handling irregularly spaced tables, OCR errors, not available for linux and so on). PDFtoHTML doesn't quite do it either. Okular seems to implement most of what is needed already - text selection, image selection (although you don't seem to be able to set the resolution for image snapshots like you can in, say, Adobe Reader - perhaps another bug/feature request) and table selection. I'll try and illustrate what I mean. The examples show generating output in .odt format, but the output format could be .doc, html or whatever. Starting with the source document: At it's simplest you could do the following: The blue selection areas are text boxes, the green selection areas are image boxes and the red selection areas of table boxes. Anyone who has used any of the pdf to doc tools before should be familiar with this kind of thing. The numbers at the upper left of each box shows the order it will appear on the page. Boxes can be added or removed and the ordering can be changed. Notice that the footer hasn't been marked and it is left out of the output. You'd work though the pdf marking the areas to convert and then click "save as odt" or whatever output format you want and the file is created. Being able to process the whole document would be great, but even being able to do a page at a time would be good. Going further, as well as marking the order of the boxes you could also mark the paragraph style. For example here I have marked the heading to be in the "Heading 1" paragraph style. Thoughts? Would anyone else find this kind of feature useful? |
Registered users: Baidu [Spider], Bing [Bot], Google [Bot]