Tuesday, February 14, 2006

Optical Character Recognition

The best package that I have used is from a company called ABBYY. Their package called FineReader comes with the ability to scan in a large number of languages. The latest version can scan into pdf files. I have used the package with some success to scan languages that do not have a latin script. You can train the package to recognise new symbols and even symbols that are very close together. The work I carried out was with the old Irish script and the old German script called Fraktur. You can train the package on a sample page then let it of by itself to do the OCR.


Anonymous cionaodh said...

Regarding OCRing old-script Irish -- what was your success rate? I tried the Mac version of Finereader a couple of years ago and wasn't able to do much better than about 50 per cent accuracy with seanchló typography. Perhaps newer versions are better?

12:09 PM  
Blogger spudshow said...

Hi Coinaodh,
My tests were done some time ago but I think that the success rate was at least 70%. I trained the software on a page of text and the let it software try the following page. The book was not very old. The colour contrast between the ink and the paper becomes a significant factor when the book is old. I have a copy of "An Seanchaidhe Muimhneach" from 1932 and will try that when I get a chance.

Níl a lán Gaeilge agan. Slán.


2:46 PM  

Post a Comment

<< Home