Jump to content

Word count issue


miketisdell

Recommended Posts

Hebrew Issues:

 

1) Cannot get standard word counts for the Hebrew text. Ge. 1:1 is counted as 11 words i.e. ב ראשית ברא אלהים את ה שמים ו את ה ארץ rather than 7. An official unpointed (and untagged) Hebrew version would go a long way to resolving this issue. Something that many have requested. 

 

2) Hebrew letters used to start and end blocks of text are counted as words. For example the ס at the end of Deut. 2:1 is counted as a word. "וַנֵּפֶן וַנִּסַּע הַמִּדְבָּרָה דֶּרֶךְ יַם־סוּף כַּאֲשֶׁר דִּבֶּר יְהוָה אֵלָי וַנָּ֯סָב אֶת־הַר־שֵׂעִיר יָמִים רַבִּים׃ ס"

 

Greek issues:

 

1) Greek book titles are being included in the word count (NA28 text). 

Link to comment
Share on other sites

Hebrew Issues:

 

1) Cannot get standard word counts for the Hebrew text. Ge. 1:1 is counted as 11 words i.e. ב ראשית ברא אלהים את ה שמים ו את ה ארץ rather than 7. An official unpointed (and untagged) Hebrew version would go a long way to resolving this issue. Something that many have requested. 

 

2) Hebrew letters used to start and end blocks of text are counted as words. For example the ס at the end of Deut. 2:1 is counted as a word. "וַנֵּפֶן וַנִּסַּע הַמִּדְבָּרָה דֶּרֶךְ יַם־סוּף כַּאֲשֶׁר דִּבֶּר יְהוָה אֵלָי וַנָּ֯סָב אֶת־הַר־שֵׂעִיר יָמִים רַבִּים׃ ס"

 

Greek issues:

 

1) Greek book titles are being included in the word count (NA28 text). 

technically these are all correct and stand as full words. The prepositions and article are separate lexems and as such, are counted separate. The samech in problem #2 is an abbreviation and as such, is counted in the text.

Link to comment
Share on other sites

To expand on Matt's answer:

 

The concept of 'word' is a highly variable one.  One can easily argue that 'word' refers to different lexemes, or one can argue that 'word' refers to the space divisions.  Should an item with a maqqef count as one word or two?  What about the paragraph marker?  Should we include NA28 titles?  Should we include Psalm titles?  Should we include Ketiv/Qere?  It is all extremely subjective.  Accordance is very clearly and consistently defining its sense of a word, and it is accurately reporting those instances.

 

In general, as Mark pointed out in the other thread, you can tailor your Accordance searches to exclude terms.  For instance, you can search for *@-[sUFFIX] to eliminate the separate suffixes.  You could make a custom greek range that ignores the NA28 title verses.

Link to comment
Share on other sites

Duplicate

Edited by miketisdell
Link to comment
Share on other sites

technically these are all correct and stand as full words. The prepositions and article are separate lexems and as such, are counted separate. The samech in problem #2 is an abbreviation and as such, is counted in the text.

 

I understand how Accordance is counting, but this is not the way anyone else counts words in the Hebrew text. 

 

The Masorites indicate that Gen - Deut has 79,856 words but Accordance tells us that it has 425,187.

 

Similarly, when you look at the Rabbinic literature, word counts are handled the same way they are handled by the Masorites. I cannot think of any single reference that sites word counts in the way that Accordance provides them, can you? Nor do I know of any source that counts the paragraph markers as "words", do you know of such a source?

 

While one may argue that it is "technically" correct, that is pretty difficult to swallow when no one else counts words in this way. What use are these counts if they can not be used when comparing to other references?

 

Additionally, this is also a departure from how all other products I have used counts words. 

Edited by miketisdell
Link to comment
Share on other sites

To expand on Matt's answer:

 

The concept of 'word' is a highly variable one.  One can easily argue that 'word' refers to different lexemes, or one can argue that 'word' refers to the space divisions.  Should an item with a maqqef count as one word or two?  What about the paragraph marker?  Should we include NA28 titles?  Should we include Psalm titles?  Should we include Ketiv/Qere?  It is all extremely subjective.  Accordance is very clearly and consistently defining its sense of a word, and it is accurately reporting those instances.

 

In general, as Mark pointed out in the other thread, you can tailor your Accordance searches to exclude terms.  For instance, you can search for *@-[sUFFIX] to eliminate the separate suffixes.  You could make a custom greek range that ignores the NA28 title verses.

 

When speaking of word counts in the Hebrew and Greek texts, it doesn't appear to be nearly as subjective as you have indicated i.e. when others reference word counts they do not include titles, paragraph markers, are break apart lexemes. Do you know of any sources outside of Accordance who provide counts in this way?

 

In general, Accordance searches provide they same kind of results that other products do i.e. if I search for every occurrence of בית I get the same count regardless of how it appears in the text, it is only when searching for "words" that the counts depart significantly from other products. And references like I cited in the other reply. 

Link to comment
Share on other sites

When speaking of word counts in the Hebrew and Greek texts, it doesn't appear to be nearly as subjective as you have indicated i.e. when others reference word counts they do not include titles, paragraph markers, are break apart lexemes. Do you know of any sources outside of Accordance who provide counts in this way?

 

In general, Accordance searches provide they same kind of results that other products do i.e. if I search for every occurrence of בית I get the same count regardless of how it appears in the text, it is only when searching for "words" that the counts depart significantly from other products. And references like I cited in the other reply. 

I think some of the problem is when they are working through and tagging texts, it is difficult to say when some things are made into lists, exclude this etc. It is possible through if/then loops and such in most coding structures, but when looking at providing texts that are simple and powerful, this is pretty good. I am not sure if they could do what you are looking for beyond a wholesale revision of how the text is uploaded and compiled.- Although, OakTree would know better than me. I see the issue you have, yet I still will say, titles and paragraph markers aside, the clitic prefixes are words in and of themselves. They combine into the following word but they are by definition their own lexeme. Yes they are part of a larger sense unit, but even in translation they are indicated separate and have their own separate form. It is hard for me personally to count them as not an individual word. That is a matter of personal preference as Joel has indicated.

Edited by MattChristian
Link to comment
Share on other sites

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
×
×
  • Create New...