Search For Individual letters In Hebrew Text

January 17, 2015

Hi,

I know I have seen someone demonstrate this but can't remember how to do it....I want to find out how many individual letters are in the Hebrew Torah?..

I have tried several ways of doing this but don't get the results I want

Any Suggestion?

Thanks

Frank

January 17, 2015

This might be terribly wrong! But, I used the wild card - asterisk, and found some 112170 words.

But I also noticed that this search missed including some letters.

January 17, 2015

This might be terribly wrong! But, I used the wild card - asterisk, and found some 112170 words.

http://www.accordancebible.com/forums/public/style_images/master/attachicon.gifScreen Shot 2015-01-17 at 4.02.25 PM.png

But I also noticed that this search missed including some letters.

http://www.accordancebible.com/forums/public/style_images/master/attachicon.gifScreen Shot 2015-01-17 at 4.02.09 PM.png

Joel,

Thanks for your quick reply....I got a similar number too...but I heard another person claim (a rabbi _) there were almost 304,805? letters in the Torah....so I am confused that the number may be words and not letters...

Frank

January 17, 2015

The 112000 is words. You can double check it in the Analysis tab.

I tried Darin Franklin's Regex For Accordance tool but it's not getting it right - the searches I try do not terminate and count millions.

300K is about an average of 3 characters per word. Not sure if that is accurate but it could be I guess.

Thx

D

January 17, 2015

Ok I think I have a method. I have done a trivial test on it on Gen 1:1 which contains 11 words.

What I did was create search tab for each word length. The first for words of 1 char, second for those of 2 and so on. You know you do not need more when you hit a search for N chars and Accordance helpfully informs you that there are none.

The search in each tab is the same [RANGE Gen-deut] <AND> ? with the number of ? increasing in each successive tab.

You then open an analysis tab for each search tab and add up the number of hit words times the word size for that tab.

In my case for gen 1:1 we have 11 words from length 1 to 5 characters. A total number of characters of 28.

I cannot read Hebrew - I've been learning for about a week and via Buth so no reading yet. So double check my method and my result.

If it looks good change the range and add tabs as necessary and let me know the result !

Thx

D

January 17, 2015

If you use RegexForAccordance, set the Filters option to remove cantillation and points. Then search for \S to find every character except whitespace. You could also remove All Spaces and then search for . (a single period).

I get 323,750 for Gen-Deut.

If you want to count alphabet characters only, not punctuation, search for \w instead. I get 305,861.

January 17, 2015

Is there any notes available that explains what these symbols (\S or \w) are for?

January 17, 2015

Thanx Darin, A simple search on . leads to counts of millions and it never finishes. I do not know if that indicates a bug in your tool.

I tried [א-ח] but that never finishes either.

Have you got a more up to date version ?

Tx

D

January 17, 2015

Thanx Darin, A simple search on . leads to counts of millions and it never finishes. I do not know if that indicates a bug in your tool.

I tried [א-ח] but that never finishes either.

Have you got a more up to date version ?

Tx

D

1.0.2 is the current version. Make sure your range is set to Gen-Deut, and the Filters are set to remove cantillation and points.

Edited January 17, 2015 by Darin Franklin

January 17, 2015

Is there any notes available that explains what these symbols (\S or \w) are for?

\S means non-whitespace character

\w means "word character", which is just an alphabet letter.

A dot . by itself is any single character.

All the special regex symbols are listed here:

http://userguide.icu-project.org/strings/regexp#TOC-Regular-Expression-Metacharacters

January 17, 2015

I'm on 1.0.0 - let me try again.

Thx

D

January 17, 2015

Actually - seems it's 1.0.2 but in any case this one works fine and I get your results Darin. Thanx

D

January 17, 2015

I tried out my suggestion above too and go this :

Chars per word Hit words Ext. Chars 6 367 2202 5 4365 21825 4 11163 44652 3 41090 123270 2 22091 44182 1 32256 32256 Total 111332 268387

That's a fair way off interestingly. Might be interesting to know why.

Also that is about 2.4 chars per word up to maybe 2.9 per word in Darin's largest regex (323750) estimate.

I wouldn't have expected such short words - its probably way larger in Greek.

Yep 737265 characters (by Regex) with 124531 words yielding about 5.9 chars per word.

I think I'm done now.

Thx

D

January 18, 2015

Searching the web, I see the number 304,805 cited frequently as the number of letters in the Torah (Wikipedia, for example: Sefer Torah).

Here is a web site that gives a breakdown by letter and by book. The count includes only the alphabet and not punctuation.
http://www.aishdas.org/toratemet/en_pamphlet9.html

The count that I get from HMT-W4 using RegexForAccordance does not match these totals. Even if I remove bracketed text, I still get more letters than 304,805.

Removing bracketed text: 305,539 letters (734 difference)
Including bracketed text: 305,861 letters (1,056 difference)

If I subtract the cases where ‭ס and פ (samekh and pei) stand alone to mark minor and major breaks, then my counts match the web site above for those two letters.

samekh minor break: 394

pei major break: 293

That brings the letter count down to 304,852 (47 difference) when removing bracketed text.

What is the reason for the remaining discrepancy?

Edited January 18, 2015 by Darin Franklin

January 18, 2015

Wow...great responses.....Thanks so much for everybody's help....

I downloaded Regex for Accordance ..THANKS so much Darin...I had some issues because it did not come from the app store

But I found how to open the app and overriding the security.............

Thanks again!

Frank

Edited January 18, 2015 by fmcfee

January 18, 2015

Here are the results I came up with for HMT-W4. Most of the letter counts do not match those given for the 304,805-letter Torah, but they are all pretty close.

Thanks for the interesting question!

January 19, 2015

Here are the results I came up with for HMT-W4. Most of the letter counts do not match those given for the 304,805-letter Torah, but they are all pretty close.

http://www.accordancebible.com/forums/public/style_images/master/attachicon.gifTorah Letter Count HMT-W4.png

Thanks for the interesting question!

Darin,

Thanks so much for your help and the screenshot.....how did you do the statistics?....was this a separate program?...

One thought I had about the differences is did this search exclude the maqqef (a hyphen used in noun construct)?

Frank

January 19, 2015

I created the table in Numbers: Torah Letter Count.numbers.zip

The letter counts are from the statistics table in RegexForAccordance. Right click the column headers on the right hand side and remove the Length and Refs columns. Then you have Hits and Count only. After searching for \w, copy the table and paste into Numbers.

For the samkekh and pei break indicators, I searched for ‭\bס\b and ‭\bפ\b. That finds the letters when they stand alone and are not part of a word (\b is word boundary). See Gen 3:15 and Gen 1:5 for examples.

The counts given on the aishdas.org web page do not include maqef (־), paseq (׀), or sof pasuq (׃) so I did not count them either. If you want, you can search for \W to find them.

January 19, 2015

I created the table in Numbers: http://www.accordancebible.com/forums/public/style_images/master/attachicon.gifTorah Letter Count.numbers.zip

The letter counts are from the statistics table in RegexForAccordance. Right click the column headers on the right hand side and remove the Length and Refs columns. Then you have Hits and Count only. After searching for \w, copy the table and paste into Numbers.

For the samkekh and pei break indicators, I searched for ‭\bס\b and ‭\bפ\b. That finds the letters when they stand alone and are not part of a word (\b is word boundary). See Gen 3:15 and Gen 1:5 for examples.

The counts given on the aishdas.org web page do not include maqef (־), paseq (׀), or sof pasuq (׃) so I did not count them either. If you want, you can search for \W to find them.

Darin,

Thanks so much for your reply....this type of research is new to me....thanks again for all your help..I appreciate your research and comments

Frank

January 20, 2021

The discrepancy between online counts and the results from a search in the HMT might be the result of minute spelling differences between the Leningrad Codex, off of which the HMT-W4 is based, and later rabbinic Bibles. Or the online counts could be based on hand counting which would inevitably result in mistakes. Also, does this count include qere-ketivs or no?

Search For Individual letters In Hebrew Text

Recommended Posts

fmcfee

Link to comment

Share on other sites

joelmadasu

Link to comment

Share on other sites

fmcfee

Link to comment

Share on other sites

Λύχνις Δαν

Link to comment

Share on other sites

Λύχνις Δαν

Link to comment

Share on other sites

Darin Franklin

Link to comment

Share on other sites

joelmadasu

Link to comment

Share on other sites

Λύχνις Δαν

Link to comment

Share on other sites

Darin Franklin

Link to comment

Share on other sites

Darin Franklin

Link to comment

Share on other sites

Λύχνις Δαν

Link to comment

Share on other sites

Λύχνις Δαν

Link to comment

Share on other sites

Λύχνις Δαν

Link to comment

Share on other sites

Darin Franklin

Link to comment

Share on other sites

fmcfee

Link to comment

Share on other sites

Darin Franklin

Link to comment

Share on other sites

fmcfee

Link to comment

Share on other sites

Darin Franklin

Link to comment

Share on other sites

fmcfee

Link to comment

Share on other sites

Iconoclaste

Link to comment

Share on other sites

Please sign in to comment

Browse

Activity