fmcfee Posted January 17, 2015 Share Posted January 17, 2015 Hi, I know I have seen someone demonstrate this but can't remember how to do it....I want to find out how many individual letters are in the Hebrew Torah?.. I have tried several ways of doing this but don't get the results I want Any Suggestion? Thanks Frank Link to comment Share on other sites More sharing options...
joelmadasu Posted January 17, 2015 Share Posted January 17, 2015 This might be terribly wrong! But, I used the wild card - asterisk, and found some 112170 words. But I also noticed that this search missed including some letters. Link to comment Share on other sites More sharing options...
fmcfee Posted January 17, 2015 Author Share Posted January 17, 2015 This might be terribly wrong! But, I used the wild card - asterisk, and found some 112170 words. http://www.accordancebible.com/forums/public/style_images/master/attachicon.gifScreen Shot 2015-01-17 at 4.02.25 PM.png But I also noticed that this search missed including some letters. http://www.accordancebible.com/forums/public/style_images/master/attachicon.gifScreen Shot 2015-01-17 at 4.02.09 PM.png Joel, Thanks for your quick reply....I got a similar number too...but I heard another person claim (a rabbi _) there were almost 304,805? letters in the Torah....so I am confused that the number may be words and not letters... Frank Link to comment Share on other sites More sharing options...
Λύχνις Δαν Posted January 17, 2015 Share Posted January 17, 2015 The 112000 is words. You can double check it in the Analysis tab. I tried Darin Franklin's Regex For Accordance tool but it's not getting it right - the searches I try do not terminate and count millions. 300K is about an average of 3 characters per word. Not sure if that is accurate but it could be I guess. Thx D Link to comment Share on other sites More sharing options...
Λύχνις Δαν Posted January 17, 2015 Share Posted January 17, 2015 Ok I think I have a method. I have done a trivial test on it on Gen 1:1 which contains 11 words. What I did was create search tab for each word length. The first for words of 1 char, second for those of 2 and so on. You know you do not need more when you hit a search for N chars and Accordance helpfully informs you that there are none. The search in each tab is the same [RANGE Gen-deut] <AND> ? with the number of ? increasing in each successive tab. You then open an analysis tab for each search tab and add up the number of hit words times the word size for that tab. In my case for gen 1:1 we have 11 words from length 1 to 5 characters. A total number of characters of 28. I cannot read Hebrew - I've been learning for about a week and via Buth so no reading yet. So double check my method and my result. If it looks good change the range and add tabs as necessary and let me know the result ! Thx D Link to comment Share on other sites More sharing options...
Darin Franklin Posted January 17, 2015 Share Posted January 17, 2015 If you use RegexForAccordance, set the Filters option to remove cantillation and points. Then search for \S to find every character except whitespace. You could also remove All Spaces and then search for . (a single period). I get 323,750 for Gen-Deut. If you want to count alphabet characters only, not punctuation, search for \w instead. I get 305,861. 1 Link to comment Share on other sites More sharing options...
joelmadasu Posted January 17, 2015 Share Posted January 17, 2015 Is there any notes available that explains what these symbols (\S or \w) are for? Link to comment Share on other sites More sharing options...
Λύχνις Δαν Posted January 17, 2015 Share Posted January 17, 2015 Thanx Darin, A simple search on . leads to counts of millions and it never finishes. I do not know if that indicates a bug in your tool. I tried [א-ח] but that never finishes either. Have you got a more up to date version ? Tx D Link to comment Share on other sites More sharing options...
Darin Franklin Posted January 17, 2015 Share Posted January 17, 2015 (edited) Thanx Darin, A simple search on . leads to counts of millions and it never finishes. I do not know if that indicates a bug in your tool. I tried [א-ח] but that never finishes either. Have you got a more up to date version ? Tx D 1.0.2 is the current version. Make sure your range is set to Gen-Deut, and the Filters are set to remove cantillation and points. Edited January 17, 2015 by Darin Franklin Link to comment Share on other sites More sharing options...
Darin Franklin Posted January 17, 2015 Share Posted January 17, 2015 Is there any notes available that explains what these symbols (\S or \w) are for? \S means non-whitespace character \w means "word character", which is just an alphabet letter. A dot . by itself is any single character. All the special regex symbols are listed here: http://userguide.icu-project.org/strings/regexp#TOC-Regular-Expression-Metacharacters Link to comment Share on other sites More sharing options...
Λύχνις Δαν Posted January 17, 2015 Share Posted January 17, 2015 I'm on 1.0.0 - let me try again. Thx D Link to comment Share on other sites More sharing options...
Λύχνις Δαν Posted January 17, 2015 Share Posted January 17, 2015 Actually - seems it's 1.0.2 but in any case this one works fine and I get your results Darin. Thanx D Link to comment Share on other sites More sharing options...
Λύχνις Δαν Posted January 17, 2015 Share Posted January 17, 2015 I tried out my suggestion above too and go this : Chars per word Hit words Ext. Chars 6 367 2202 5 4365 21825 4 11163 44652 3 41090 123270 2 22091 44182 1 32256 32256 Total 111332 268387 That's a fair way off interestingly. Might be interesting to know why. Also that is about 2.4 chars per word up to maybe 2.9 per word in Darin's largest regex (323750) estimate. I wouldn't have expected such short words - its probably way larger in Greek. Yep 737265 characters (by Regex) with 124531 words yielding about 5.9 chars per word. I think I'm done now. Thx D Link to comment Share on other sites More sharing options...
Darin Franklin Posted January 18, 2015 Share Posted January 18, 2015 (edited) Searching the web, I see the number 304,805 cited frequently as the number of letters in the Torah (Wikipedia, for example: Sefer Torah).Here is a web site that gives a breakdown by letter and by book. The count includes only the alphabet and not punctuation.http://www.aishdas.org/toratemet/en_pamphlet9.htmlThe count that I get from HMT-W4 using RegexForAccordance does not match these totals. Even if I remove bracketed text, I still get more letters than 304,805.Removing bracketed text: 305,539 letters (734 difference)Including bracketed text: 305,861 letters (1,056 difference) If I subtract the cases where ס and פ (samekh and pei) stand alone to mark minor and major breaks, then my counts match the web site above for those two letters. samekh minor break: 394 pei major break: 293 That brings the letter count down to 304,852 (47 difference) when removing bracketed text. What is the reason for the remaining discrepancy? Edited January 18, 2015 by Darin Franklin Link to comment Share on other sites More sharing options...
fmcfee Posted January 18, 2015 Author Share Posted January 18, 2015 (edited) Wow...great responses.....Thanks so much for everybody's help.... I downloaded Regex for Accordance ..THANKS so much Darin...I had some issues because it did not come from the app store But I found how to open the app and overriding the security............. Thanks again! Frank Edited January 18, 2015 by fmcfee Link to comment Share on other sites More sharing options...
Darin Franklin Posted January 18, 2015 Share Posted January 18, 2015 Here are the results I came up with for HMT-W4. Most of the letter counts do not match those given for the 304,805-letter Torah, but they are all pretty close. Thanks for the interesting question! Link to comment Share on other sites More sharing options...
fmcfee Posted January 19, 2015 Author Share Posted January 19, 2015 Here are the results I came up with for HMT-W4. Most of the letter counts do not match those given for the 304,805-letter Torah, but they are all pretty close. http://www.accordancebible.com/forums/public/style_images/master/attachicon.gifTorah Letter Count HMT-W4.png Thanks for the interesting question! Darin, Thanks so much for your help and the screenshot.....how did you do the statistics?....was this a separate program?... One thought I had about the differences is did this search exclude the maqqef (a hyphen used in noun construct)? Frank Link to comment Share on other sites More sharing options...
Darin Franklin Posted January 19, 2015 Share Posted January 19, 2015 I created the table in Numbers: Torah Letter Count.numbers.zip The letter counts are from the statistics table in RegexForAccordance. Right click the column headers on the right hand side and remove the Length and Refs columns. Then you have Hits and Count only. After searching for \w, copy the table and paste into Numbers. For the samkekh and pei break indicators, I searched for \bס\b and \bפ\b. That finds the letters when they stand alone and are not part of a word (\b is word boundary). See Gen 3:15 and Gen 1:5 for examples. The counts given on the aishdas.org web page do not include maqef (־), paseq (׀), or sof pasuq (׃) so I did not count them either. If you want, you can search for \W to find them. Link to comment Share on other sites More sharing options...
fmcfee Posted January 19, 2015 Author Share Posted January 19, 2015 I created the table in Numbers: http://www.accordancebible.com/forums/public/style_images/master/attachicon.gifTorah Letter Count.numbers.zip The letter counts are from the statistics table in RegexForAccordance. Right click the column headers on the right hand side and remove the Length and Refs columns. Then you have Hits and Count only. After searching for \w, copy the table and paste into Numbers. For the samkekh and pei break indicators, I searched for \bס\b and \bפ\b. That finds the letters when they stand alone and are not part of a word (\b is word boundary). See Gen 3:15 and Gen 1:5 for examples. The counts given on the aishdas.org web page do not include maqef (־), paseq (׀), or sof pasuq (׃) so I did not count them either. If you want, you can search for \W to find them. Darin, Thanks so much for your reply....this type of research is new to me....thanks again for all your help..I appreciate your research and comments Frank Link to comment Share on other sites More sharing options...
Iconoclaste Posted January 20, 2021 Share Posted January 20, 2021 The discrepancy between online counts and the results from a search in the HMT might be the result of minute spelling differences between the Leningrad Codex, off of which the HMT-W4 is based, and later rabbinic Bibles. Or the online counts could be based on hand counting which would inevitably result in mistakes. Also, does this count include qere-ketivs or no? Link to comment Share on other sites More sharing options...
Recommended Posts
Please sign in to comment
You will be able to leave a comment after signing in
Sign In Now