This forum has been archived. All content is frozen. Please use KDE Discuss instead.

PDF import filter

Tags: None
(comma "," separated)
racitup
Registered Member
Posts
3
Karma
0
OS

PDF import filter

Wed Sep 29, 2010 11:26 am
Hi all,

I'm trying to convert an embedded text based PDF to an editable form.
I've read around on the internet that KWord used to be able to do this, but from what I've read and seen (just installed KWord 2.1.2 on Ubuntu 10.04) this functionality was dropped in KOffice 2.

Can someone confirm this?
Does anyone know if this function is being brought back/worked on?

KWord appears to be the only program that could do this so removing support seems like a crazy idea from a USP point of view!

Thanks in advance! ;)
User avatar
cyrille
Moderator
Posts
110
Karma
1

Re: PDF import filter

Wed Sep 29, 2010 11:35 am
Yes it was removed from kword. But now, there is a PDF importer for karbon.


Cyrille Berger
Krita developer and Calligra release coordinator
blog
User avatar
google01103
Manager
Posts
6668
Karma
25

Re: PDF import filter

Wed Sep 29, 2010 1:43 pm
there's also an extension for OpenOffice http://extensions.services.openoffice.o ... /pdfimport

of course it may/may not work well


OpenSuse Leap 42.1 x64, Plasma 5.x

racitup
Registered Member
Posts
3
Karma
0
OS

Re: PDF import filter

Wed Sep 29, 2010 2:06 pm
cyrille wrote:Yes it was removed from kword. But now, there is a PDF importer for karbon.


Wow, thanks for the fast response!

Unfortunately the new importer isn't much use to me. I have just tried it and tried exporting as all the different file formats supported, but they are all drawing formats.

The most promising is the Opendocument Drawing format which is essentially XML. But each character (Yes, character!) is a separate drawing object. So I would need to write a script to group adjacent characters as paragraphs, and then convert the object type to be a Document text box or something.

Does anyone know a good way of doing this?
racitup
Registered Member
Posts
3
Karma
0
OS

Re: PDF import filter

Wed Sep 29, 2010 2:56 pm
google01103 wrote:there's also an extension for OpenOffice http://extensions.services.openoffice.o ... /pdfimport

of course it may/may not work well


Wow2, thanks for the further info!

FYI, I just tried the OO import extension and that worked much better than the Karbon one in that it had single line sentences instead of characters.

The only promising option given by OO is to save a XHTML. This actually writes a text based HTML file which is editable!

The XHTML renders great in IE but not in Firefox.
I suspect I can just copy and paste into a Document to edit.

This is all a bit of a pain though. Can't someone just write an XSLT script to convert odg to odt?
Or even better just add a convertor into OO?

Thanks all!
User avatar
google01103
Manager
Posts
6668
Karma
25

Re: PDF import filter

Wed Sep 29, 2010 4:04 pm
there are other possiblities out there:
http://en.wikipedia.org/wiki/Pdftotext
http://pdfedit.petricek.net/en/index.html

and some are web based (some require email address)
http://www.convertpdftoword.net/ (I tried this one, worked for me)
http://www.freepdfconvert.com/
http://www.zamzar.com/


OpenSuse Leap 42.1 x64, Plasma 5.x

panda84
Moderator
Posts
376
Karma
1
OS

Re: PDF import filter

Wed Sep 29, 2010 4:23 pm
Okular can too export pdf to text (File → Export as..).


Usate il pulsante Accept this answer per marcare una discussione come risolta!
Blog - LUG - KDE - Lavoro
john_hudson
Registered Member
Posts
549
Karma
2
OS

Re: PDF import filter

Sun Oct 03, 2010 8:10 pm
Assuming the text is embedded in the PDF as text, Okular works fine for extracting text. You can also extract images by selecting them and saving them. The only problem you may encounter is that high quality text may have ligatures in it which, as they are legitimate Unicode characters, will be saved with the file but cause problems with spell-checkers and some programs which don't recognise ligatures as valid characters.

You can extract text which is embedded in a graphic by scanning it using an OCR.


John Hudson, proud to be a member of KDE forums since 2008-Oct.


Bookmarks



Who is online

Registered users: Bing [Bot], claydoh, Google [Bot], markhm, rblackwell, sethaaaa, Sogou [Bot], Yahoo [Bot]