Jump to content

OCR Software for Original Language User Tools


Michael Miles

Recommended Posts

Years ago, ABBYY had the leading Hebrew OCR software, and at that time it was actually not that good.  Does anyone know what the current consumer grade OCR software might be for original language text capture?  I have material that is so out of copyright that I'd like to perhaps see about using OCR software to capture, so that the output would be searchable text rather than graphics.

 

Thank you,

Michael

Link to comment
Share on other sites

Hi Michael,

 

Check out the entire discussion at http://www.accordancebible.com/forums/topic/15316-pdf-to-word-converter-099/?hl=abbyy&do=findComment&comment=74012 .

 

Regards,

 

Michel

Thank you Michel,

 

You and Daniel Selmer's extensive trek through the available OCR "solutions" has led me to download gimagereader 3.01.  FREE is a great price, and if I can train this at all, maybe it would be helpful.

 

Regards,

Michael

Link to comment
Share on other sites

Hi Michael,

 

  I had some dreams of taking Tesseract and building up a package for bible nuts with 10-12 of our favorite dead languages in it all neatly packaged. I think the market exists for such a thing but it's very small on tiny budgets - so perhaps that's a user base rather than a market :) I have been working on a Greek English text which I OCR'd with gimagereader, to examine what kinds of errors I see from a scan. I have not got all that far, but am making progress. I can definitely get a user tool out of it. I just have to work out all the cleanup issues. Obviously a better initial scan would help a lot. I read the stuff on training and planned to teach it about Hebrew but have not started that piece of fun yet - but if you get it done first then I won't have to :) Anyhow good luck and let us know how you get on.

 

Thx

D

  • Like 2
Link to comment
Share on other sites

Hi Michael,

 

  I had some dreams of taking Tesseract and building up a package for bible nuts with 10-12 of our favorite dead languages in it all neatly packaged. I think the market exists for such a thing but it's very small on tiny budgets - so perhaps that's a user base rather than a market :) I have been working on a Greek English text which I OCR'd with gimagereader, to examine what kinds of errors I see from a scan. I have not got all that far, but am making progress. I can definitely get a user tool out of it. I just have to work out all the cleanup issues. Obviously a better initial scan would help a lot. I read the stuff on training and planned to teach it about Hebrew but have not started that piece of fun yet - but if you get it done first then I won't have to :) Anyhow good luck and let us know how you get on.

 

Thx

D

I have not even read the docs yet, so there's that.  :)  I'm staring at scanning a pile of material to potentially place online and this material would be far more usable to everyone if it were not just static bitmap images.  If training this software proves to be a pain in the backside, I just may post high quality scans (much better than most Internet Archive material) and see about OCR later on down the road.  At least your digging effort, which I really appreciate you sharing here, saved me a bag of gold that I would have readily dropped on an OCR product that was not up to muster.  My thanks for you being point man on this mission.

 

If OCRing proves to be not workable, what I may end up doing is to post the high quality scans and then set up a community effort to key them in.  100 people each doing 10 pages beats 1 person doing 1,000 pages.

Link to comment
Share on other sites

If OCRing proves to be not workable, what I may end up doing is to post the high quality scans and then set up a community effort to key them in

 

Hi Michael

 

Of course, the better the quality of the scan, the higher the accuracy of the character recognition. It may be just a matter of community proofreading.

 

Regards,

 

Michel

Link to comment
Share on other sites

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
×
×
  • Create New...