Jump to content

OCR of Greek text --> convert to Helena?


preterist1

Recommended Posts

I have a *book published over 100 years ago which has a lot of Greek material in it. I need to be able to have that Greek material in the computer so I can copy and paste it into the body of a scholarly article I'm writing.

 

I'm using the ReadIris Pro 11 (Corporate Edition) program to do the OCR. I am using it on a Powerbook G4 running OS 10.4 (Tiger). However, it does not recognize the Greek text and put it in the Helena font for me.

 

Is there something I need to do to setup the ReadIris program so that it will recognize Greek and put it into a Greek font for me. Does it need some other font besides Helena to do that (such as Teknia or LaserGreek or Symbol)? Or can it be done directly into Helena? Does it require me to "TRAIN" the OCR software in order for it to recognize it and put it into the right font and format?

 

Since you folk have done a LOT of OCR work with the Greek and Hebrew, I suspect you could give me a few tips on how to do this. Would be much appreciated.

 

Thanks in advance for your help on this.

 

-- Ed Stevens (preterist1@aol.com)

 

*The name of the book: The New Testament in the Apostolic Fathers (produced by the Oxford Society of Historical Theology, 1905, Printed in England by Henry Frowde at Oxford at the Clarendon Press).

Link to comment
Share on other sites

Did you select : Menu->Settings->Language... and select "Greek" from languages before "recognition"?

 

Yes, I selected "Greek-English" as the language, and it seemed to let me make corrections to some of the characters it couldn't recognize, but it didn't do a very good job recognizing the rest of the text. It didn't ask for learning help on all of them. It also did not seem to offer any help on the accents and breathing marks. I'm wondering if any OCR program is able to handle the accents and breathing marks? Do we simply ignore those for OCR purposes and use some kind of software conversion utility later to look at the unaccented text and apply the proper accents? Or do we have to do that manually?

 

How do the folks here at Oak Tree handle the OCR of Greek texts? How do they get the accents and breathing marks in there?

Link to comment
Share on other sites

We do not OCR texts at all! On the whole we receive our etexts and just have to convert, correct, and mark them up. That's why I did not reply before, I do not know if it is possible to get accurate OCR in Greek.

 

The texts that do need etexting are outsourced. The company we use is not cheap, but their work is excellent. If you are interested, write to me personally and I will put you in touch. They can convert to Helena, since that is what they use for us.

Link to comment
Share on other sites

Guest frgpeter

Did you select : Menu->Settings->Language... and select "Greek" from languages before "recognition"?

 

Yes, I selected "Greek-English" as the language, and it seemed to let me make corrections to some of the characters it couldn't recognize, but it didn't do a very good job recognizing the rest of the text. It didn't ask for learning help on all of them. It also did not seem to offer any help on the accents and breathing marks. I'm wondering if any OCR program is able to handle the accents and breathing marks? Do we simply ignore those for OCR purposes and use some kind of software conversion utility later to look at the unaccented text and apply the proper accents? Or do we have to do that manually?

 

I tried both the Greek and the Greek-English. No accents, but does have the breathing marks.

 

I suppose it depends a lot on your scanner and its settings as well. Also the condition of the text ( you say it's 100 years old ) can also play into this. Perhaps the type on the page is "heavy"?

 

Sorry - not able to suggest anything else.

 

--G. Peter

Link to comment
Share on other sites

On an obliquely related note, I recently discovered a wonderful website with links to PDFs of scanned manuscripts and printed Bibles, including Codices Alexandrinus, Sinaiticus and Vaticanus, Tiscehndorf's NT, Scrivener's NT, Stephanus (1546 & 1550), Scrivener (1881), Erasmus (1516, 1518, & 1522), Elzevir (1624, 1633), Beza (1565, 1588, & 1598), the Complutensian Polyglot, etc etc etc.

 

I downloaded The Lot, obviously (6.62 GB!), but I did not record the URL of where I found them (oops!).

 

I don't know, therefore, if this comment is helpful or not.

 

But if you trawl the web, you can find some amazing stuff.

 

~Alistair

Link to comment
Share on other sites

We do not OCR texts at all! On the whole we receive our etexts and just have to convert, correct, and mark them up. That's why I did not reply before, I do not know if it is possible to get accurate OCR in Greek.

 

The texts that do need etexting are outsourced. The company we use is not cheap, but their work is excellent. If you are interested, write to me personally and I will put you in touch. They can convert to Helena, since that is what they use for us.

 

That was very helpful, Helen. Thanks! I'll pass on the outsourcing idea. That sounds too expensive for me. I don't need the whole book, only about two chapters. So, I'll just grit my teeth and struggle through text entry on my keyboard! Ugggh!

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...