Jump to content

Search For Individual letters In Hebrew Text


fmcfee

Recommended Posts

Hi,

 

I know I have seen someone demonstrate this but can't remember how to do it....I want to find out how many individual letters are in the Hebrew Torah?..

I have tried several ways of doing this but don't get the results I want

 

Any Suggestion?

 

Thanks

Frank

Link to comment
Share on other sites

This might be terribly wrong! But, I used the wild card - asterisk, and found some 112170 words.

 

post-31592-0-13314900-1421528707_thumb.png

 

But I also noticed that this search missed including some letters.

 

post-31592-0-46226000-1421528725_thumb.png

Link to comment
Share on other sites

This might be terribly wrong! But, I used the wild card - asterisk, and found some 112170 words.

 

http://www.accordancebible.com/forums/public/style_images/master/attachicon.gifScreen Shot 2015-01-17 at 4.02.25 PM.png

 

But I also noticed that this search missed including some letters.

 

http://www.accordancebible.com/forums/public/style_images/master/attachicon.gifScreen Shot 2015-01-17 at 4.02.09 PM.png

Joel,

 

Thanks for your quick reply....I got a similar number too...but I heard another person claim (a rabbi _) there were almost 304,805? letters in the Torah....so I am confused that the number may be words and not letters...

 

Frank

Link to comment
Share on other sites

The 112000 is words. You can double check it in the Analysis tab.

I tried Darin Franklin's Regex For Accordance tool but it's not getting it right - the searches I try do not terminate and count millions.

300K is about an average of 3 characters per word. Not sure if that is accurate but it could be I guess.

 

Thx

D

Link to comment
Share on other sites

Ok I think I have a method. I have done a trivial test on it on Gen 1:1 which contains 11 words.

What I did was create search tab for each word length. The first for words of 1 char, second for those of 2 and so on. You know you do not need more when you hit a search for N chars and Accordance helpfully informs you that there are none.

 

The search in each tab is the same [RANGE Gen-deut] <AND> ? with the number of ? increasing in each successive tab.

You then open an analysis tab for each search tab and add up the number of hit words times the word size for that tab.

 

In my case for gen 1:1 we have 11 words from length 1 to 5 characters. A total number of characters of 28.

 

I cannot read Hebrew - I've been learning for about a week and via Buth so no reading yet. So double check my method and my result.

If it looks good change the range and add tabs as necessary and let me know the result !

 

Thx

D

Link to comment
Share on other sites

If you use RegexForAccordance, set the Filters option to remove cantillation and points. Then search for \S to find every character except whitespace. You could also remove All Spaces and then search for . (a single period).

 

I get 323,750 for Gen-Deut.

 

If you want to count alphabet characters only, not punctuation, search for \w instead. I get 305,861.

  • Like 1
Link to comment
Share on other sites

Is there any notes available that explains what these symbols (\S or \w) are for?

Link to comment
Share on other sites

Thanx Darin, A simple search on . leads to counts of millions and it never finishes. I do not know if that indicates a bug in your tool.

I tried [א-ח] but that never finishes either.

 

Have you got a more up to date version ?

 

Tx

D

Link to comment
Share on other sites

Thanx Darin, A simple search on . leads to counts of millions and it never finishes. I do not know if that indicates a bug in your tool.

I tried [א-ח] but that never finishes either.

 

Have you got a more up to date version ?

 

Tx

D

 

1.0.2 is the current version. Make sure your range is set to Gen-Deut, and the Filters are set to remove cantillation and points.

Edited by Darin Franklin
Link to comment
Share on other sites

Is there any notes available that explains what these symbols (\S or \w) are for?

 

\S means non-whitespace character

\w means "word character", which is just an alphabet letter.

A dot . by itself is any single character.

 

All the special regex symbols are listed here:

http://userguide.icu-project.org/strings/regexp#TOC-Regular-Expression-Metacharacters

Link to comment
Share on other sites

I'm on 1.0.0 - let me try again.

 

Thx

D

Link to comment
Share on other sites

Actually - seems it's 1.0.2 but in any case this one works fine and I get your results Darin. Thanx

D

Link to comment
Share on other sites

I tried out my suggestion above too and go this :

 

 

Chars per word Hit words Ext. Chars 6 367 2202 5 4365 21825 4 11163 44652 3 41090 123270 2 22091 44182 1 32256 32256 Total 111332 268387

 

That's a fair way off interestingly. Might be interesting to know why.

Also that is about 2.4 chars per word up to maybe 2.9 per word in Darin's largest regex (323750) estimate.

I wouldn't have expected such short words - its probably way larger in Greek.

 

Yep 737265 characters (by Regex) with 124531 words yielding about 5.9 chars per word.

 

I think I'm done now.

 

Thx

D

Link to comment
Share on other sites

Searching the web, I see the number 304,805 cited frequently as the number of letters in the Torah (Wikipedia, for example: Sefer Torah).

Here is a web site that gives a breakdown by letter and by book. The count includes only the alphabet and not punctuation.
http://www.aishdas.org/toratemet/en_pamphlet9.html

The count that I get from HMT-W4 using RegexForAccordance does not match these totals. Even if I remove bracketed text, I still get more letters than 304,805.

Removing bracketed text: 305,539 letters (734 difference)
Including bracketed text: 305,861 letters (1,056 difference)

 

If I subtract the cases where ‭ס and פ (samekh and pei) stand alone to mark minor and major breaks, then my counts match the web site above for those two letters.

samekh minor break: 394

pei major break: 293

 

That brings the letter count down to 304,852 (47 difference) when removing bracketed text.

 

What is the reason for the remaining discrepancy?

Edited by Darin Franklin
Link to comment
Share on other sites

Wow...great responses.....Thanks so much for everybody's help....

 

I downloaded Regex for Accordance ..THANKS so much Darin...I had some issues because it did not come from the app store

But I found how to open the app and overriding the security.............

 

Thanks again!

 

 

Frank

Edited by fmcfee
Link to comment
Share on other sites

Here are the results I came up with for HMT-W4. Most of the letter counts do not match those given for the 304,805-letter Torah, but they are all pretty close.

post-29604-0-08330000-1421623494_thumb.png

Thanks for the interesting question!

Link to comment
Share on other sites

Here are the results I came up with for HMT-W4. Most of the letter counts do not match those given for the 304,805-letter Torah, but they are all pretty close.

http://www.accordancebible.com/forums/public/style_images/master/attachicon.gifTorah Letter Count HMT-W4.png

Thanks for the interesting question!

Darin,

 

Thanks so much for your help and the screenshot.....how did you do the statistics?....was this a separate program?...

One thought I had about the differences is did this search exclude the maqqef (a hyphen used in noun construct)?

Frank

Link to comment
Share on other sites

I created the table in Numbers: Torah Letter Count.numbers.zip

 

The letter counts are from the statistics table in RegexForAccordance. Right click the column headers on the right hand side and remove the Length and Refs columns. Then you have Hits and Count only. After searching for \w, copy the table and paste into Numbers.

 

For the samkekh and pei break indicators, I searched for ‭\bס\b and ‭\bפ\b. That finds the letters when they stand alone and are not part of a word (\b is word boundary). See Gen 3:15 and Gen 1:5 for examples.

 

The counts given on the aishdas.org web page do not include maqef (־), paseq (׀), or sof pasuq (׃) so I did not count them either. If you want, you can search for \W to find them.

Link to comment
Share on other sites

I created the table in Numbers: http://www.accordancebible.com/forums/public/style_images/master/attachicon.gifTorah Letter Count.numbers.zip

 

The letter counts are from the statistics table in RegexForAccordance. Right click the column headers on the right hand side and remove the Length and Refs columns. Then you have Hits and Count only. After searching for \w, copy the table and paste into Numbers.

 

For the samkekh and pei break indicators, I searched for ‭\bס\b and ‭\bפ\b. That finds the letters when they stand alone and are not part of a word (\b is word boundary). See Gen 3:15 and Gen 1:5 for examples.

 

The counts given on the aishdas.org web page do not include maqef (־), paseq (׀), or sof pasuq (׃) so I did not count them either. If you want, you can search for \W to find them.

Darin,

 

Thanks so much for your reply....this type of research is new to me....thanks again for all your help..I appreciate your research and comments

Frank

Link to comment
Share on other sites

  • 6 years later...

The discrepancy between online counts and the results from a search in the HMT might be the result of minute spelling differences between the Leningrad Codex, off of which the HMT-W4 is based, and later rabbinic Bibles. Or the online counts could be based on hand counting which would inevitably result in mistakes. Also, does this count include qere-ketivs or no?

Link to comment
Share on other sites

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
×
×
  • Create New...