Order Toll Free 1-877-339-5855
News, How-tos, and assorted Views on Accordance Bible Software.

Monday, April 14, 2008  

Lies, Damned Lies, and Statistics

"There are three kinds of lies: lies, damned lies, and statistics."

This famous saying, which I'd always heard attributed to Mark Twain, apparently originated with Benjamin D'Israeli. Whoever first said it, its meaning is clear: numbers can be very misleading.

Last week, I searched the MT/LXX for all occurrences of the Hebrew word bara using the Merge command, and got 54 hits. Then I searched for all the places where bara is translated with poieo in the Greek Septuagint. The number of hits returned was 30. Next I searched for all the places bara is not translated with poieo. This returned 39 hits.

I then asked you to explain the apparent anomaly in the Hit counts for these various searches. If there are 54 occurrences of bara and 30 of them are translated with poieo, we would expect our search for those cases which are not translated with poieo to number just 24 (54 - 30 = 24), rather than 39.

I only got one response explaining the numbers, but that response was so thorough and clearly written that I suspect no one else felt the need to respond. Here is the explanation given in the comments on the last post:

In the first search where you merge the BHS and the LXX using the AND operator, each occurrence of bara and each occurrence of poiew are counted as hits separately--i.e. each pairing of bara/poiew counts as 2 hits (one for each word). So, the 30 hits reduces to 15 actual results.

In the other search--BHS NOT LXX--this phenomenon doesn't occur since negated search terms don't produce any counted hits. Thus, only the occurrences of bara in the result set are counted as hits, and the 39 hits equals 39 results.

Therefore, the problem with the calculation in the previous post is that you're including the number of hits on poiew from the first search. When you exclude those, the numbers work out as expected: 54 - 15 = 39

That's exactly right, and I couldn't have explained it more clearly. I'm glad the commenter wasn't fooled by my intentional misreading of the numbers. I hope others of you found this little exercise to be illustrative. It's important, when looking at the statistical information in Accordance, to make sure you understand how things are being counted.

In general, any single search element will be counted as a hit. So for example, if I search for "Moses AND Aaron," each occurrence of Moses and each occurrence of Aaron will be counted as a single hit. If, however, I search for the actual phrase "Moses and Aaron," each occurrence of that phrase will be counted as a hit. If the phrase occurs five times, Accordance will give a hit count of 5. If it were counting each word separately, it would give a hit count of 15, which would be confusing in most cases.

If I were to search for two phrases joined by a Search command, such as "Moses and Aaron OR Jacob and Esau," Accordance would count each occurrence of the phrase "Moses and Aaron" and each occurrence of the phrase "Jacob and Esau." It would not count each of the words in those phrases.

Similarly, if I develop a search using a Construct window, each occurrence of the entire contruction is counted as a hit, as opposed to the individual words within that construction.

Finally, as our commenter made clear, negated items are not counted as hits. In the search "Moses NOT Aaron," each occurrence of Moses is counted. Aaron can't be counted because it would not exist in the set of verses returned by such a search.

At first blush, all this talk of what gets counted might seem a little confusing. On the one hand, it might be more consistent just to count each individual word that was found by any search, regardless of whether you were using search commands, searching for phrases, or searching for constructs. But when you search for a three-word phrase, do you really want to have to divide the number of hits by three to get the actual number of times that phrase occurs? I certainly don't. Accordance therefore attempts to count the number of hits as intelligently as possible, so that the number you get is most likely to be the number that makes the most sense.

Nevertheless, if statistics are one of the three kinds of lies, it doesn't hurt to know how Accordance is arriving at the figures it gives.

This blog was pretty useless when it was posted a week ago. Please, replace it with something useful or just take it off.

Post a Comment

<< Home

This page is powered by Blogger. Isn't yours?