This forum has been archived. All content is frozen. Please use KDE Discuss instead.

OCR software KDE4.x

Tags: None
(comma "," separated)
User avatar
caryb
Registered Member
Posts
23
Karma
0
OS

OCR software KDE4.x

Mon Feb 23, 2009 4:06 am
Hi,
I have been compiling software for the last couple of hours to try to find a replacement for Kooka! Does anyone know of a decent OCR package that will work in KDE 4.2? I have tried xsane, it does work in KDE but is shocking with about 20% correctly getting words correct.

Thanks Cary

BTW I did see on Google Kolourpaint but I couldn't work out how to OCR with it!


Ubuntu user 7859 registered Linux user 470405
Lenovo T61 Kubuntu Jaunty 64bit Intel Core 2 Duo T7500 / 2 GHz, 4 GB DDR II SDRAM - 667 MHz,
NVIDIA Quadro NVS 140M PCI Express, Wireless Intel 3945ABG
User avatar
Rettich
Registered Member
Posts
123
Karma
0
OS

RE: OCR software KDE4.x

Mon Feb 23, 2009 11:30 am
You can scan images with skanlite and use OCR in command line (eg "ocrad input.xbm > output.txt")

I think there isn't any KDE4 app which does all the work.


Murphy's Law is recursive. Washing your car to make it rain doesn't work.
troycarpenter
Registered Member
Posts
29
Karma
0
OS

RE: OCR software KDE4.x

Tue Feb 24, 2009 3:35 pm
Rettich wrote:You can scan images with skanlite and use OCR in command line (eg "ocrad input.xbm > output.txt")

I think there isn't any KDE4 app which does all the work.


Here's something I did the other day that worked quite well if you have relatively small amounts of text. I scanned a document and saved it as PDF. If your document is scanned straight, then Adobe Acrobat (8.1.3 as of this writing) had little trouble doing copy-paste on large segments of text. It had trouble when the text was on dark background or was crooked.

I tried this on various PDFs from different sources (generated via an editor, scanned, downloaded from the Internet, etc) and they all worked fine.

As a final test, I just tried the same with Okular, and it was also able to parse the text so I could paste the text into Kate.
User avatar
furanku
Registered Member
Posts
100
Karma
0
OS

RE: OCR software KDE4.x

Tue Mar 03, 2009 3:49 pm
troycarpenter wrote:Here's something I did the other day that worked quite well if you have relatively small amounts of text. I scanned a document and saved it as PDF. If your document is scanned straight, then Adobe Acrobat (8.1.3 as of this writing) had little trouble doing copy-paste on large segments of text. It had trouble when the text was on dark background or was crooked.

I tried this on various PDFs from different sources (generated via an editor, scanned, downloaded from the Internet, etc) and they all worked fine.

As a final test, I just tried the same with Okular, and it was also able to parse the text so I could paste the text into Kate.


Do you mean Adobe Acrobat Reader (which is the only product of the acrobat suite that's available for Linux)? Then you did not an OCR (*optical* character recognition) but a simple copy and paste of anyway machine readable text. I'm a little surprised that you report that also worked with scanned documents. It's impossible that the Acrobat Reader converted pixel graphics into ASCII Text. That's what specialized OCR software is for, and Acrobat Reader hasn't any OCR features.

PDFs can be either just a picture of the page, or consist of the text in a machine readable format and instructions how to render it. Just in the latter case the Acrobat Reader or Okular are able to extract the text.

Unfortunately no good and easy to use OCR software is available for Linux. I had the best results with googles tesseract engine, but that has no GUI up to now and it looks like google lost a bit its interest in the development, since they didn't release any new versions for one year.
troycarpenter
Registered Member
Posts
29
Karma
0
OS

RE: OCR software KDE4.x

Thu Mar 05, 2009 4:47 am
furanku wrote:
troycarpenter wrote:Here's something I did the other day that worked quite well if you have relatively small amounts of text. I scanned a document and saved it as PDF. If your document is scanned straight, then Adobe Acrobat (8.1.3 as of this writing) had little trouble doing copy-paste on large segments of text. It had trouble when the text was on dark background or was crooked.

I tried this on various PDFs from different sources (generated via an editor, scanned, downloaded from the Internet, etc) and they all worked fine.

As a final test, I just tried the same with Okular, and it was also able to parse the text so I could paste the text into Kate.


Do you mean Adobe Acrobat Reader (which is the only product of the acrobat suite that's available for Linux)? Then you did not an OCR (*optical* character recognition) but a simple copy and paste of anyway machine readable text. I'm a little surprised that you report that also worked with scanned documents. It's impossible that the Acrobat Reader converted pixel graphics into ASCII Text. That's what specialized OCR software is for, and Acrobat Reader hasn't any OCR features.

PDFs can be either just a picture of the page, or consist of the text in a machine readable format and instructions how to render it. Just in the latter case the Acrobat Reader or Okular are able to extract the text.

Unfortunately no good and easy to use OCR software is available for Linux. I had the best results with googles tesseract engine, but that has no GUI up to now and it looks like google lost a bit its interest in the development, since they didn't release any new versions for one year.


I mean exactly what I said...I have been able to take scanned pages from books and other sources, and as long as they are not skewed, both Adobe and Okular could highlight a section of text and paste out ASCII text. One of the two programs (don't remember which) even asked if I wanted to copy the rectangle as text characters or as a graphic. If the text was skewed, then it didn't work.

The other day I was able to scan 4 pages of exercises for one of my kid's classes into a PDF document, then copy the text out with Adobe Acrobat reader into OpenOffice. The only thing I had to do was reformat the lines since the copy/paste function seems to work on a "per line" basis and did word-wrapping the same as in the original.

Maybe some type of OCR was done to the text by the scanning software, and that's what's getting taken out of the PDF document, but my account is accurate...I was able to take two PDF documents, both scans of text (one I scanned and one that was emailed to me by an organization...very clearly a scan from a workbook), and it indeed copied the text into the clipboard.
User avatar
Alec
Registered Member
Posts
565
Karma
1
OS

RE: OCR software KDE4.x

Thu Mar 05, 2009 6:03 am
What scanning software did you use?


Get problems solved faster - get reply notifications through Jabber!


Bookmarks



Who is online

Registered users: Bing [Bot], Google [Bot], Sogou [Bot], Yahoo [Bot]