Optical Character Recognition
The best package that I have used is from a company called ABBYY. Their package called FineReader comes with the ability to scan in a large number of languages. The latest version can scan into pdf files. I have used the package with some success to scan languages that do not have a latin script. You can train the package to recognise new symbols and even symbols that are very close together. The work I carried out was with the old Irish script and the old German script called Fraktur. You can train the package on a sample page then let it of by itself to do the OCR.
2 Comments:
Regarding OCRing old-script Irish -- what was your success rate? I tried the Mac version of Finereader a couple of years ago and wasn't able to do much better than about 50 per cent accuracy with seanchló typography. Perhaps newer versions are better?
Hi Coinaodh,
My tests were done some time ago but I think that the success rate was at least 70%. I trained the software on a page of text and the let it software try the following page. The book was not very old. The colour contrast between the ink and the paper becomes a significant factor when the book is old. I have a copy of "An Seanchaidhe Muimhneach" from 1932 and will try that when I get a chance.
Níl a lán Gaeilge agan. Slán.
Brendan.Bolger@gmail.com
Post a Comment
<< Home