Jump to content


Photo

Word occurrences Applescript, GNT and a spreadsheet


  • Please log in to reply
2 replies to this topic

#1 Daniel Semler

Daniel Semler

    Platinum

  • Active Members
  • PipPipPipPipPip
  • 1,125 posts
  • Gender:Male
  • Accordance Version:10.x

Posted 03 January 2014 - 03:48 AM

Hi ya,

 

  I was wondering the other day a bit about the distribution of words in the GNT. Were there very few words that just occurred once ? What was the maximum occurrence count of a word ? What did the number of forms look like when plotted against the number of occurrences ? I can easily see that that ὁ in its various forms is the single word most frequently occurring in the GNT at 19865 occurrences - a simple verse search of [count 17000-20000] reveals that. And likewise [count 1-1] (or [count 1]) finds all the words occurring only once in any form. But in between what does the curve look like. What I wanted was to run a bunch of queries and plot the results because I don't know another way to do this in Accordance. (Incidentally if anyone does know I'd love to hear about it). So my solution was to script it with AppleScript. I figured those interested in the possibilities here might be interested in the code involved. I've attached it and a workspace to run it against though creating it yourself is not hard.

 

  For those who just want to see the chart here it is : Attached File  GNT28TOccurrences.jpg   35.26KB   2 downloads. It was produced by taking the CSV file produced by the script and graphing it in LibreOffice. Note that where buckets were a range rather than a single value I plotted the x axis as the end of the bucket (highest occurrence count) rather than the lowest.

 

  I had wondered if I would get a bell shape or multiple peaks or what. I did not which still surprises me - there are very few cases where one bucket has more than the preceding comparably sized bucket. With bucket sizes of just 1 you do see fluctuation but the general trend is such that as words become less frequently represented in the text the number of such words increases. (Note the bumps in the curve are caused by the bucketing boundaries on the logarithmic x axis.) But spot checks confirm that the results appear to be as they are.

 

  This leads to the rather interesting find that of the 5426 distinct words in the GNT, 1940 occur only once. Thus the last 35% of your vocab (if you learn by frequency which a lot of 1st year grammars seem to teach) is gonna hurt to acquire and you are not going to get representative usage in the GNT alone. Conversely of course there are over 130000 word occurrences in the text so a couple of thousand rare ones hopefully won't cause too much trouble :)

 

  Now this is a somewhat trivial example though I found it fun, but the code solved a number of problems that I expect I'll hit again in subsequent experiments.

 

  Anyhow, feel free to shoot holes in my analysis, or my code, or both. The attached code is pretty simple - I don't know how to do anything complicated in AS yet. I documented what I could and its only really prototype code in a sense. It's helping me learn about GUI scripting and when it helps and what it can do for me. Perhaps others will find the techniques useful.

 

Thx

D

Attached Files


Accordance Configurations :

 

Mac : 2009 27" iMac                 Windows : HP 4540s laptop

      Intel Core Duo                          Intel i5 Ivy Bridge

      12GB RAM                                8GB RAM

      Accordance 10.4.2.1                     Accordance 10.4.2.2 and Aleph 10.4.3b1

      OSX 10.9 (Mavericks)                    Win 7 Professional x64 SP1


#2 James Tucker

James Tucker

    Platinum

  • Active Members
  • PipPipPipPipPip
  • 637 posts
  • Gender:Not Telling
  • Accordance Version:10.x

Posted 03 January 2014 - 08:31 AM

Daniel, 

 

You might be interested in Smile by Satimage. It works with AppleScript to automate datavisulization.http://www.satimage....e/en/index.html



#3 Daniel Semler

Daniel Semler

    Platinum

  • Active Members
  • PipPipPipPipPip
  • 1,125 posts
  • Gender:Male
  • Accordance Version:10.x

Posted 03 January 2014 - 09:20 AM

Hey James, I've previously read about Smile but did not realise it was still in production. I'll check it out. Looks very interesting.

 

thx

D


Accordance Configurations :

 

Mac : 2009 27" iMac                 Windows : HP 4540s laptop

      Intel Core Duo                          Intel i5 Ivy Bridge

      12GB RAM                                8GB RAM

      Accordance 10.4.2.1                     Accordance 10.4.2.2 and Aleph 10.4.3b1

      OSX 10.9 (Mavericks)                    Win 7 Professional x64 SP1





0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users