This forum has been archived. All content is frozen. Please use KDE Discuss instead.

Pandoc cannot parse odt files from Calligra

Tags: None
(comma "," separated)
dcbuist
Registered Member
Posts
20
Karma
0
I have been using the command line tool pandoc to convert odt files to other formats like txt and docx. I have noticed that odt files created, modified or saved in Calligra cannot be converted in pandoc. Everytime I try this I get error messages like the following:
Code: Select all
[buist@DL-7370 ODT]$ pandoc -s -o TestDocument.docx TestDocument1.odt
Couldn't parse odt file.
[buist@DL-7370 ODT]$ pandoc -s -o TestDocument.txt TestDocument1.odt   
Couldn't parse odt file.

The target format seems to be irrelevant. So far I have tried converting into docx and txt, but the result is the same (as demonstrated in the above example).

However, this error does not occur with odt files from other applications, like Libreoffice or pandoc itself. Files created and saved in Calligra can be transferred to a different system that has Libreoffice installed. If I open and save them in LibreOffice and then transfer them back again, pandoc converts them perfectly.

Is there something different about the format of odt files from Calligra that would make them incompatible with pandoc, while odt files from other applications are OK?
vandenoever
Registered Member
Posts
1
Karma
0
That's an interesting problem to debug. ODT files are just zip files, so you could unzip the odt, modify it and zip it again to find your way to the problem.

ODF files created by Calligra usually pass the ODF validator. However, pandoc might make an invalid assumption about the content of ODF files.

It turns out that pandoc expects that content.xml contains an <office:font-face-decls> element. This element is optional and not always present in ODT files saved by Calligra. Pandoc accepts the file from Calligra when an empty <office:font-face-decls/> is added to content.xml.

So this is a bug in pandoc.
dcbuist
Registered Member
Posts
20
Karma
0
Thanks, vandenoever. I have replicated the issue exactly as you describe. Unzipping the odt file, adding <office:font-face-decls/> to content.xml and zipping up again makes the file parsable in pandoc. I will report this bug to pandoc.


Bookmarks



Who is online

Registered users: bartoloni, Bing [Bot], Google [Bot], Sogou [Bot], Yahoo [Bot]