Jump to content

PDF to Word Converter ($0.99)?


Abram K-J

Recommended Posts

Does anyone know about this app?

 

https://itunes.apple.com/us/app/pdf-to-word/id503013264?mt=12

 

It's $0.99 (on sale), and has mixed reviews, so I'm skeptical. But converting PDFs to Word and then doing html imports to Accordance would be really helpful!

 

Has anyone tried this app?

Link to comment
Share on other sites

There are several that are free online, just Google "PDF to Word Converter."  If you are going to use PDF's a lot then I highly recommend PDF Nomad. I scan a lot of documents using my Fujitsu Scansnap and it will do OCR, split pages, along with a lot of other neat stuff. It will export in several formats including Rich Text which will easily open in Word. 

 

Many PDF's available online are simply pictures of a page and have not be run through OCR yet. I also own Adobe's Acrobat Professional and like its OCR engine the best, but it is very PRICEY. 

 

Just for the record, I have not used the app you mentioned on the App Store. 

Link to comment
Share on other sites

Hi Tony,

 

  How do PDF Nomad and Adobe's OCR features do with multilingual texts ? I've played a little with texts containing Greek and English and I've had no great success with getting a really good OCR done.

 

Thx

D

Link to comment
Share on other sites

For Multi-language, Abbyy Finereader is the best there is.

 

http://finereader.abbyy.com/pro_for_mac/

 

Unfortunately, their Mac version lags behind their Windows option. Most feature differences are minor, but the biggest drawback is that the Mac version does not support training or user dictionaries.

Edited by mwdiers
  • Like 1
Link to comment
Share on other sites

I've played with it and if I recall it did better than some. I'll have to look at it again. I'm not familiar with the dictionaries feature in it. I'll take another look.

I can use it on Windows so its handy to know the discrepancy exists.

 

Thx

D

Link to comment
Share on other sites

Hi Daniel,

 

OmniPage18 for Windows is often/(usually?) ranked ahead of ABBYY FineReader.

 

Regards,

 

Michel

Link to comment
Share on other sites

Hey Michel,

 

  That one I don't think I've heard of. I'll look it up also.

 

Many thanx

D
 

Link to comment
Share on other sites

Hello

 

Adobe use the OCR engine of Readiris.  Readiris Pro is about 120$ but converts only 50 pages at one time. The bigger cost ca. 500$. I have the Readers Pro 12. Very fast but also no Hebrew. The 14th edition has it only on PC.

 

Omnipage 18 cost also ca. 500$ but you need also a special scanner too. Because they have another system.

 

Abbyy one that i choose, can more languages, but how mwdiers sad I'm also not really happy. Today morning I make an e-mail to them, but the support will only sell.

It calls Pro but for me its "simple", to less features. For the price 1star, even if they have a better engine. 

 

Enolsoft Mac App Store can not Hebrew. I have it too. I wrote to them, and the support are very nice, but...no Hebrew. For the price 4 stars

 

 

I was looking for a long time for a really good OCR and try and buy a lot, but at the moment no one impress me.

 

Greetings

 

Fabian

  • Like 1
Link to comment
Share on other sites

Daniel, I have scanned multilingual text but have not really taken the time to verify how well they handled them. You may wish to contact the folks at PDF Nomad and ask them about capabilities. https://sintraworks.com/index.php/sintraworks/pdfnomad_home

 

I got PDF Nomad when it was on sale for $19.99. What I love about it is when you can a book with facing pages it can split them really quickly. I have be very pleasantly surprised with its features.

Link to comment
Share on other sites

 

Omnipage 18 cost also ca. 500$ but you need also a special scanner too. Because they have another system.

 

 

Hi Fabian, 

 

OmniPage18 is $149.99 on their site, http://www.nuance.com/for-business/by-product/omnipage/standard/index.htm, and goes on sale sometimes.

 

You can set up almost any scanner with it: see http://omnipage.helpmax.net/en/getting-started/setting-up-your-scanner/

 

I recommended it to Daniel because he has Windows, and the top two multi-lingual ocr programs in Windows are OmniPage and ABBYY.

 

From what I've seen, OmniPage18 is getting the better reviews.

 

Regards,

 

Michel

Link to comment
Share on other sites

For Multi-language, Abbyy Finereader is the best there is.

 

http://finereader.abbyy.com/pro_for_mac/

 

Unfortunately, their Mac version lags behind their Windows option. Most feature differences are minor, but the biggest drawback is that the Mac version does not support training or user dictionaries.

 

Nor Hebrew (or RTL languages), but Windows does.

Link to comment
Share on other sites

I used to use textbridge 11 and found it more accurate than omnipage but havent upgraded either as virtually stopped using it as scanners can do english text easily but i was looking at iris. http://www.simpleocr.com/Arabic_OCR/Hebrew_OCR.asp?gclid=CL2XupjRy8MCFQ6WtAodTHgAAwFor hebrew.

Edited by ukfraser
Link to comment
Share on other sites

Hi Fabian, 

 

OmniPage18 is $149.99 on their site, http://www.nuance.com/for-business/by-product/omnipage/standard/index.htm, and goes on sale sometimes.

 

You can set up almost any scanner with it: see http://omnipage.helpmax.net/en/getting-started/setting-up-your-scanner/

 

I recommended it to Daniel because he has Windows, and the top two multi-lingual ocr programs in Windows are OmniPage and ABBYY.

 

From what I've seen, OmniPage18 is getting the better reviews.

 

Regards,

 

Michel

 

Hello Michel

I search the whole Net and on the newest version was Abbyy a little ahead. 

 

No one OCR is 100% perfect. So to edit after must be nice and easy. And Abbyy unfortunately has no Editer for the Mac version. Thats absolutely annoying. They have a function which are show the letters where FineReader are unsure but you can't then edit it. Its for nothing. Even I have convert a file with a library, FineReader has it recognized but the links goes to nirvana. And you can't Edit. 

 

On the archive.org you can download the Big Gesenius not only the grammar that Accordance have. And on this at the moment all OCR that I have tested had lost their teeth.

O.K. that is really hard stuff, not only Arial 20point in the 1200dpi resolution.

 

But thanks I have not seen the bargain from Nuance. Maybe I buy it to test. But I have gave a lot of money for OCR and so the "Cheap" books comes very expensive:-) With all the works.

 

Greetings

 

Fabian

Link to comment
Share on other sites

Does anyone know about this app?

 

https://itunes.apple.com/us/app/pdf-to-word/id503013264?mt=12

 

It's $0.99 (on sale), and has mixed reviews, so I'm skeptical. But converting PDFs to Word and then doing html imports to Accordance would be really helpful!

 

Has anyone tried this app?

Hi Abram

 

Back to the Topic 

 

Yes I have it. I buy it as package when it was ca. 13$ instead of 75$. Its nice, easy and the OCR is much better than many others. It can make Footnotes, Endnotes, etc with this price you can't go wrong.

It can multiple languages, but no hebrew at the moment and there is no page-limit at one time, like others.

 

I have this tested also for the http://www.accordancebible.com/forums/topic/13317-apocal-t-the-christian-apocryphal-apocalypses/?hl=enolsoft#entry62902 

 

Greetings

 

Fabian

Edited by Fabian
Link to comment
Share on other sites

So I know Omnipage for Mac 499.99$ because you have then to buy the Ultimate.

 

Recognizes over 120 languages

Process, edit and store documents from anywhere in the world. OmniPage includes the recognition of languages based on the Latin-, Greek- and Cyrillic alphabets as well as Chinese, Japanese and Korean languages.

 

Seems no Hebrew.

Link to comment
Share on other sites

 

Seems no Hebrew.

 

Hi again,

 

That's true, but I was responding to Daniel ocr-ing "texts containing Greek and English."  

 

Also, you said "I was looking for a long time for a really good OCR and try and buy a lot, but at the moment no one impress me."

 

I had a similar experience trying to ocr Hebrew for my thesis, including printed editions and manuscripts (wishful thinking!). I just gave up and typed what I needed, learning lots about textual transmission and the types of errors that can creep in.

 

For Hebrew, there is ABBYY, and ReadIRIS that ukfraser mentioned above. If you find that they work well now, please tell me.

 

 

Hi Abram,

 

I hope you don't mind the digression; Daniel provided the opening when he mentioned he could use Windows.  

 

Regards,

 

Michel

  • Like 1
Link to comment
Share on other sites

Hi Michel

 

Readiris: I have the version that comes with my last scanner. Readers Pro 12. This is unable to read hebrew but the 14.

 

Abbyy: I have bough the last Version. At the moment to much lacks to Edit. They find it's not necessary for the Mac version. Ha ha bad joke. Here are others better but they can't read Hebrew.

 

How I say before. Yesterday morning I wrote once again to the support from Abbyy. Hopefully they add some very urgent tools.

 

Greetings

 

Fabian

Link to comment
Share on other sites

Er ... sorry Abram. Kinda derailed yer thread. Hope it might have been helpful though.

 

Thanx everyone - it'll take me some playing around to work out what seems to work.

The model I would like to see is a core engine with pluggable language modules that you buy individually.

My suspicion is that a model like that with a biblical languages module would sell rather well around here - it would only need to cover about a dozen scripts. Could be cool.

 

Anyhow I'll stop dreaming and do some tests.

 

Thanx again

D

Link to comment
Share on other sites

I would like to see is a core engine with pluggable language modules

 

Hi Daniel,

 

I haven't found anything like that so far. The closest is "ABBYY FineReader can be trained to recognize all Unicode symbols," i.e., it can be trained to recognize a new language, at http://knowledgebase.abbyy.com/article/534

 

It says you can "Add all the necessary symbols to the alphabet of the new language."

 

 

 

On the archive.org you can download the Big Gesenius not only the grammar that Accordance have. And on this at the moment all OCR that I have tested had lost their teeth.

 

Hi Fabian,

 

Maybe you could train FineReader to recognize different Hebrew fonts - it would be a lot easier to add an old style Hebrew font as a new language.

 

 

Hi Accordance,

 

Perhaps you could post this derailment as a new topic, and preserve Abram's original question and topic.

 

Regards,

 

Michel

Link to comment
Share on other sites

Wow--this is great. Thanks, everyone! No need to apologize for taking the topic in the direction of other PDF converters. You've given me some good posts to wade my way through. :)

 

Someone may have already mentioned this, but I'd love something that could handle Hebrew and Greek, too.

Link to comment
Share on other sites

I just downloaded a trial copy of ABBYY FineReader 12 Pro for Windows. I printed a few verses of Genesis from the LXX to a PDF file and then tried reading that. It can mostly get it but the diacritics are almost all lost. I have tried the Language Editor to sort that out but have hit a snag. I've sent a note to their support people. Hopefully I can get this simple test going soon. Then I can try stuff from scans. Will report back.

 

Once I've got a grip on some of this I'll probably give OmniPage a go.

 

Thx

D
 

Link to comment
Share on other sites

I've done OCR with Adobe Acrobat in the past and the problem with doing OCR for Biblical Greek is that the rules for using diacritical signs in Modern Greek were changed in 1982 so that only the acute and diaeresis are used - and thus the only diacritical signs to be recognized by OCR engine. From what I've seen, Spiritus Asper and Lenis will more often than not be read as an acute, but sometimes the entire letter will not be rendered at all. This is often the case for letters having iota subscriptum and circumflex.

 

A similar issue also exists with doing OCR on Biblical Hebrew, as it only works when nothing but the consonants are present, which makes sense as it is the standard in Modern Hebrew. Texts with vowels and/or cantillation will always come out completely distorted.

 

I've also tried ABBYY, but I find that the "learning process" it has isn't very effective for the ancient languages with diacritics. I also tried to "teach" it the Syriac script Estrangela once. It did not go well, to say the least. At least it was an interesting experiment.

 

With kind regards

 

Peter Christensen

Edited by Pchris
Link to comment
Share on other sites

Thanx Peter. I had a rather similar experience last night ABBYY. I got beyond my little hiccup. I got the training mode going but there are several issues. The two biggies appear to be :

 

  1. It tends misidentify the diacritical marks as a separate line of superscripted marks rather than as part of the characters below. I was able to do a very targetted selection of a part of a line and that then did not do this. I don't know exactly how I got it to do that, and in any case that's hardly viable for any reasonable text. With that I was able to use training mode.

 

  2. There is a fairly severe restriction on the learning mode documented in the doc :

 

3. A pattern can only be used for documents that have the same font, font size, and
resolution as the document used to create the pattern.
 

    I'm pretty sure that this means that you could be ok if you could scan all your own documents in a consistent way. But using random scans from the web you may or may not be ok, leading to a retraining.

 

  3. The training process itself is not particularly quick.

 

  4. Multilingual OCR'ing this line using which has a mix of English and Greek :

 

Gen. 1:1     Ἐν ἀρχῇ ἐποίησεν ὁ θεὸς τὸν οὐρανὸν καὶ τὴν γῆν.

 

  results in confusion of some letters for English where they should be interpreted as Greek. τ being misinterpreted as t, ο as o and so on. I was able once to get ῇ trained in but then ὴ was read as ῇ.

 

 This might seem minor but quite frankly it will cause a lot of rework in the document.

 

  So I'm not yet massively impressed. I need to find a solution for 1 before the training will be at all effective. Then I need to explain to it how to deal with things like 4 above. And what I was testing with was not a document scan but a direct print from Accordance to PDF, so it has no scanning artifacts or such to confused the OCR process with. I think I'll retry the experiment with a larger font size print and see if that helps with any of the above.

 

  Then time to get another tool. By the way, anyone done any work with Tesseract ?

 

Thx

D
 

Edited by Daniel Semler
  • Like 1
Link to comment
Share on other sites

Hi Peter and Daniel,

 

Along with the blessings of working in ancient languages, there is also the curse - we spend an inordinate amount of time on word processing tasks that are simple and straightforward in English, etc. We are always hoping that some of these tasks will be addressed and simplified, and when one of them is, it ranks among the most important events of our lives. The day that Microsoft fixed the right to left issue with proper word wrapping and Unicode (in Office 2003; XP Office almost had it) ranks almost as high as my wedding day and the births of my children.

 

Based on my experiences, I thought it would be difficult to ocr anything other than a clean consonantal/letter text. If ocr even worked for transliterated texts it would be a step forward. Perhaps ABBYY could be trained to recognize transliteration. But I would only try again based on your findings Daniel. So I look forward to your reports.

 

I'm thankful for your efforts.

 

 

Regards,

 

Michel

  • Like 2
Link to comment
Share on other sites

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
×
×
  • Create New...