Order Toll Free 1-877-339-5855
News, How-tos, and assorted Views on Accordance Bible Software.

Friday, February 16, 2007  

Importing HTML Documents into User Tools

For the past week or so, I've been talking about User Tools. So far, we've explored how to create a user tool from scratch and how to merge multiple user tools together. Today, we'll look at one of the coolest features of user tools: the ability to import html documents into a user tool.

Depending on your point of view, this feature of Accordance can be one of the most exciting, and one of the most frustrating. The reason for excitement is obvious. With this feature, you can potentially download all kinds of resources from the web and convert them into fully-searchable Accordance modules. The frustrations can come when you actually import an html document and find that there are still numerous issues you need to fix. I'll discuss some of the specific issues involved with importing user tools in future posts, but for now, I just want to show you how it's done.

Rather than just talking about importing an html document in the abstract, let's actually import the same html file together. We'll use as our sample a book called A Companion to the Bible by E.P. Barrows which is available from Project Gutenberg. Please note that I found this text simply by going to Project Gutenberg and doing a search for books with "Bible" in the title. I know nothing about the particulars of this book or its author, so please don't send me angry letters complaining about the theological perspective of something I told you to import into Accordance. :-)

To download this html file, just go to this page. At the bottom of the page you'll see a list of different file formats and links where they can be downloaded. If you control-click on the "main site" link beside either the compressed or uncompressed HTML files, you'll get a contextual menu with a choice to "Download Linked File." This will download a file named "17265-h.htm" to your hard drive.

Okay, now that you have the cryptically-named html file on your hard drive, open Accordance and go to File-->User Files-->Import User Tool...

A dialog box will appear giving you the option to import from plain text, HTML, or (for those who have purchased the option) TLG. Make sure HTML is selected. You also have the option to create a new User tool, or to add the file you're importing to the end of an existing User Tool. Select Create a new User Tool and click OK.

An alert dialog box will then appear reminding you that the HTML import feature should not be used to violate copyright. So if your conscience is clear, click OK to dismiss the alert and continue.

The next dialog box to appear is a standard Open dialog box asking you to locate the HTML file you want to import. For most people, the file you downloaded from the web will be located on your desktop. Use command-D to navigate to the desktop quickly (this tip works with any Mac program) and select the file named "17265-h.htm."

The next dialog box to appear is a standard Save dialog box asking you to name your User Tool and decide where you want to save it. I named mine "Companion to the Bible" and saved it in the User Tools folder inside the Accordance Files folder inside my Documents folder.

That's all there is to it! A progress dialog will appear showing the progress of the import, which can take a few minutes depending on the size of the file. Once it's done, a new User Tool window will open displaying the User Tool.

If you open the browser, you'll see that most of the titles have been formatted as such and even assigned a basic browser hierarchy. Look through the tool and you'll see that most of the obvious Scripture references have automatically been hypertexted. You can now search your tool, browse and read it, or edit it to your liking.

Next week, we'll look at the specifics of what the HTML import did and how it interpreted the HTML tags—something we admittedly have not documented as well as we should. But in the meantime, try importing other HTML files and see what happens. You can also save word processing documents as HTML files and then import them into Accordance. It's a great way to add new materials to your Accordance library.


P.S.: If you're using Accordance 7, please make sure you've downloaded the free update to Accordance 7.1. There was a bug in 7.0 which messed up some HTML imports.

How do I link to a scripture passage in a user tool? I'd love to be able to make some magical change to an HTML file I want to import as a user tool and then after it is imported scripture passages are linked. Any advice? Thanks.

- Stan


In the future, we may support some kind of Scripture tag that you could mark up an html file with before importing, but we're not there yet.

There are other strategies you can use in the meantime, but I'll have to cover those in a future post.

I'd appreciate a future article about those other strategies.

Another couple of questions and perhaps a suggestion if I may... Does this import support UTF-8 encoding? I'm particularly curious about all of the messed up entities in Microsoft Word Documents. Often an HTML doc that has headers sent with it for UTF-8 will display correctly on a Mac, but without those headers properly sent they choke. I haven't been able to import a doc with the aforementioned characters successfully yet.

Any hope of importing ThML files in the future? The above problem would still be an issue, but this could be a very useful tool as it would open up a variety of resources on CCEL. OSIS might be nice as well...

I'm working on a converter right now to convert some of the ThML docs on CCEL to a more readable html format for importing, but its proving to be a pain.

- Stan

Thanks, David. This is one feature I'd really like to master that I feel pretty weak in, so I'm enjoying these posts.


First, thank you incredibly much for your blog posts. Accordance has become vastly more useful to me because of them.

Second, since you're on the subject, can you address the matter of importing stuff from CCEL? Which format should be imported? There are usually several choices that they offer for download. None are explicitly marked as HTML. Please help! Thanks!


Post a Comment

<< Home

This page is powered by Blogger. Isn't yours?