Jump to content

First Letters Only for Memorization


rdaren

Recommended Posts

No, but you could have a script that took your text selection (you can select a whole chapter), and then it would paste the result in a new TextEdit window as you want it.

What language are you intending?

Link to comment
Share on other sites

Here's a simple script that would do what you want:

-- Will alter the contents of the clipboard, deleting all letters that do not begin a word, 
-- and pasting the result in a TextEdit document
property c : "_"
set s to the clipboard
set r to do shell script "echo " & quoted form of s & " | perl -pe 's/\\B[a-zA-Z]/" & c & "/g'"

tell application "TextEdit"
	activate
	set d to make new document at end of documents
	tell d to set its text to r
end tell

So, if you copy:

6 A voice says, “Cry out!”
		And I said, “What shall I cry?”
	All people are grass,
		their constancy is like the flower of the field. 
7 	The grass withers, the flower fades,
		when the breath of the LORD blows upon it;
		surely the people are grass.

and then run the script, the result will place this in a new TextEdit document:

6 A v____ s___, “C__ o__!”
		A__ I s___, “W___ s____ I c__?”
	A__ p_____ a__ g____,
		t____ c________ i_ l___ t__ f_____ o_ t__ f____. 
7 	T__ g____ w______, t__ f_____ f____,
		w___ t__ b_____ o_ t__ L___ b____ u___ i_;
		s_____ t__ p_____ a__ g____.
Edited by Joe Weaks
Link to comment
Share on other sites

Hi,

Joe's example is amazing. You can return the first word of a chapter which while it's not quite what you want, is still a challenge.

 

I then turned all my text white so I couldn't see it and I got a reasonable prompt.

 

Here is a screen cap. Not quite what you asked for, but might be helpful (I have restricted this example to Genesis using <AND> <Range Gen>

 

post-29509-0-61129000-1386881183_thumb.jpg

Link to comment
Share on other sites

Wow, thanks for such thoughtful answers. Would these work for Hebrew and Greek?

 

The secret sauce of Joe's AppleScript is that it shells out to perl to do the heavy lifting - this line right here:

 

perl -pe 's/\\B[a-zA-Z]/" & c & "/g'"

 

the [a-zA-Z] is matching all letters from a-z (lowercase) or A-Z (uppercase) that is not at a word boundary (\B) and replacing those letters with an underscore (held in the variable c). If we change this to:

 

perl -pe 's/\\B\w/" & c & "/gu'"

 

We should match all unicode (/u) words (\w). I'm not at home so I can't test this, but it might work.

Link to comment
Share on other sites

It might work in Perl if you can hit the unicode string codes but AppleScript won't take styled text. I've never been able to get a Greek cut and paste from Acc into an AS string variable and retain the encoding. I usually just got garbage. Its major shortcoming of AS strings as far as I can tell. I'd love it if they would fix it. Well that or someone would prove me wrong and show me how, that would be good too. I'm also not anywhere I can try this out though.

 

thx

D

Link to comment
Share on other sites

Guys, it's nice to have conversation partners in these matters, with others that know a pence.

 

Daniel,

AS handles Unicode strings quite well these days. Unicode is not 'styled' text. AS does handle Unicode strings.

I can cut and paste and modify and pass on Unicode strings using AS with no problem. The limitations in this area on osx are the unicode support in the pcre.

 

I usually use sed, but the version of sed that comes with osx does not recognize word boundaries indicators \b \B.

 

Right now, as was pointed out, the script is designed for roman letters only... I used [a-zA-Z] instead of \w so it would leave two-digit verse references alone.

The problem with trying to change the regex statement to \p{Letters}, etc. is that the PCRE that comes with osx does not support them. And also, using [ ] to iterate mangles the text from stdout to AS.

And, \w does NOT match Unicode letter characters. That's a problem in the shell too. Not in AS. I don't immediately know of a solution, even in the shell.

 

If rdaren needs Greek, I could do a routine that doesn't use regex, instead loops each word. Not very robust for long selections though. But, it will work as AS handles a unicode 'character' quite well (combined accent or not).

 

rdaren,

There are many ways you could implement this script... one would be to create an Automator workflow, and use the 'run this applescript' step and paste in the script.

Edited by Joe Weaks
Link to comment
Share on other sites

Hey Joe, I'm suprised to hear this and yep its true Unicode is not styled text per se, but I have had a piece of code around for a while which pulls Greek vocab from the Acc analysis pane and if I pull it into AS for editing (as you do above with set s to the clipboard) I get gibberish - it looks like bad translit. So I wonder then what I am doing wrong.

 

Oh perhaps I need to set my export options to unicode !! Never thought to try that. Hmmm.....

 

Thx

D

Edited by Daniel Semler
Link to comment
Share on other sites

… perhaps I need to set my export options to unicode !! Never thought to try that …

Love it.

Happens to the best of us. Share your script for the greater good.

I wrote a UI script for quickly toggling an Accordance preference.

Link to comment
Share on other sites

I have wanted to share it for ages but I have this longstanding issue, which you sir, may have just given me the answer to. My w/a is pretty hideous.

I use it for creating Flashcards which has helped me immeasurably with Greek so with this piece of info I may finally be able to get it onto the exchange in reasonable order. I'll try to find time this w/e.

Glad I stuck my oar in.

 

And many thanx

D

Link to comment
Share on other sites

To illustration the bug in the pcre in default osx installation, if you change my perl command to:

perl -pe 's/\\Bα/" & c & "/g'

 

notice it's only substituting lower case alpha, and then copy this text to clipboard:

Mark 1:1 Ἀρχὴ τοῦ εὐαγγελίου Ἰησοῦ Χριστοῦ [υἱοῦ θεοῦ].

 

and then run the script and it works as expected, outputting:

Mark 1:1 Ἀρχὴ τοῦ εὐ_γγελίου Ἰησοῦ Χριστοῦ [υἱοῦ θεοῦ].

but if you simply put square brackets around the alpha, as in [α], then the stdout munges the text with the do shell script command:

Mark 1:1 ·ºàœÅœá·Ω¥ œÑ_ø·ø¶ _µ·Ωê___≥_≥_µ_ª·Ω∑_øœÖ ·º∏_∑œÉ_ø·ø¶ _ßœÅ_πœÉœÑ_ø·ø¶ [œÖ·º__ø·ø¶ _∏_µ_ø·ø¶].

Edited by Joe Weaks
Link to comment
Share on other sites

Hey Joe,

 

OK so I was going to try setting the output to unicode only it was already so. I changed the font it used and that was interesting but made no especial difference to the problem. Basically I think it is a style question not a unicode one. And I think the issue is that analysis results contain a mix of font information. Anyhow if I cut to the clipboard and set a var to its value I lose that info. If I instead simply tell TextEdit to tell SystemEvents to cmd-v paste then its in the right font and pasted nicely. Alas this doesn't help much as I need to edit the output a bit which I cannot do in AS without losing the font information it seems. I can by pasting it into TextWrangler which has nicer search and replace capabilities than TE. I'll play a little with your example above and see.

 

Thx

D

Link to comment
Share on other sites

Daniel,

You are correct... if you read a string into an AS variable, it will lose style info (font type, size, etc.).

You can't manipulate the text in TextWrangler either, cause pasting it there will also lose the style info.

 

What is your final destination for collecting the style text?

You can past it into the previous version of Pages or into TextEdit and do some text manipulation, keeping style info.

MS Word is the best though. It's highly text-manipulatible programmatically and maintains the style info. You can paste it in there and have at it using AS or VBA.

Link to comment
Share on other sites

I'll briefly explain the flow so you get what I'm trying to do.

 

1. take an Acc search query like [count 1-50] <AND> [range eph] from a user

2. run that query and launch the basic analysis of it which gives the vocab list

3. edit that vocab list reformatting it to tab separated so that its acceptable for upload to cram.com for flashcard use. This where the c & p comes in. And a bunch of regexes which work fine in TW. And I don't need all the style preserved but I need a font that can correctly render Greek. And TW is scriptable enough to be able to set that up properly.

4. Once all the editing is done the user can save the file and then push it to cram as a cut and paste into their import page.

 

Then you have a nice shiny new flashcard set to cram with.

 

So I can get what I need with TW but its not elegant and its racy - you can have issues where TW hasn't booted yet and we try to paste and it doesn't work properly. I've added various checks but its still not solid so I have been very reluctant to release it to others. Also not everyone has TW as its not standard. TE can do it if its default font setup is ok but my tests this evening are proving unreliable but that's probably me being unfamiliar with scripting TE. But then I'm not sure it can do subs I want as easily as TW either.

 

I don't use Word on the Mac though I use a clone - LibreOffice. But that said I'd like to have a single script execution - one button flashcards - well not quite but ... And I'd like to not have to use non-standard software that someone might not have. I know I'm being picky but hey if its worth coding :)

 

If you know a way to reliably tell TE what font to use I might be able to recast it that way but .... actually another thought, perhaps an OS callout to perl might be the way to go. I'm only removing/changing stuff in the ASCII range really, parens, tabs, + etc. It might work.

 

Any other thoughts ?

 

Thx

D

Link to comment
Share on other sites

Daniel,

 

Firstly, you don't have to specify font style information to render unicode Greek. Whatever program, TW and TE included, will choose a display font to render Unicode properly. Your approach to Unicode text and font style info is not quite solid. There is no font style info associated with the text once you paste it into TW, for instance. The fact that TW can set a display font has nothing to do with any style info encoded into the text. If your goal is a tab-delimed text file for use in a flashcard app, then you do not have any style info retention needs.

 

Secondly, if you google for "flashcord file converter", you will see that I have already made public a one-click solution for creating a tab-delimited flashcard database file based upon an Accordance search window. Dozens of users (that I know of) use the resource with success.

I'd suggest having a look, seeing how I did it, and pulling any ideas from there.

Link to comment
Share on other sites

Thanx Joe,

 

I didn't realise you could save Analysis tabs to files - that's useful.

 

The main point in your response is interesting and I need to clarify something. As I understand it Unicode is merely an encoding - number for character in its most primitive form but more or less just that. I do not understand its intricacies yet but I'll get there if I need to. The style information in the selection though is different. In the analysis tab for a Greek text vocab search there are two languages represented. If I have my export settings set to export in Unicode and in font Lucinda Grande when I cmd-C and cmd-V into TE I get the text pasted in with that font. This is true even if the default font for the new document is different. Doesn't that indicate that style information is in fact present ? In TW it appears a bit differently and there it looks like its not retaining the font info but using whatever it has in place. The difference is presumably because of how they each handle paste from the clipboard.

 

But you are right that I don't really care what the font is so long as Greek is rendered properly where it exists and that likewise the English is. I guess that so long as I uses a Unicode font that can display the relevant glyphs I'll be able to read it. But that still leaves me with some confusion over the actual fault in the case of using an AS string to store the text and then paste it into anything. At that point I appear to get gibberish. I still don't actually know why that is - or I may not - I actually don't know at this point whether I know or not :) One thing that would help me is to know what exactly is considered style information. I thought it was font, font styling like bold, italic etc. size and so on. Is that correct ?

 

I guess to really work out what I'm seeing I may have to dig into Unicode and such. Perhaps the reason I'm seeing gibberish in some pastes is simply that the font the thing is rendered in is not Unicode or is but does not have glyphs in the code range being pasted. I'll investigate more. If that's the case I may finally make progress.

 

I had a look at your file converter and now I recall that I had seen it before. It assumes as its starting point a file created by the user before running the converter on that file. I wanted to take it from the point of query entry if I could.

 

Many thanx for taking the time to help me out here.

 

Thx

D

Link to comment
Share on other sites

Daniel,

Yes, when you copy text from Accordance, the style info is in fact present, including font name and size. You see that when you paste into TE.

But, when you paste into TextWrangler, or Textmate (or TextEdit if the format has been set to 'Make Plain Text'), you do not see that Lucida Grande or size font info because those are pure text editors.

The difference is because TE is a mini-word processor that can save files as a word processing document that encodes style components. TW on the other hand is a pure text editor. Pasting text into TW will lose all style info, just like it does when you read a string into an AS variable... all style info is lost.

All I was saying about Unicode Greek, is that the Unicode code-points/character encoding is preserved in both styled text and pure text. You don't need style info to maintain Unicode characters.

I do not know why you're seeing gibberish. As I said, if you shared your code, I could see where the problem is.

 

As for the Flashcord File Converter, it could be modified to work on an active Accordance window with a couple lines of code, no problem. The file droplet approach made it more fool-proof for users.

Link to comment
Share on other sites

Hi Joe,

 

I'll PM you a version of the broken code - ie. the one that goes via an AS var.

 

Thx

D

Link to comment
Share on other sites

Ah, I am now reminded of the error you are getting.

 

When Accordance 'added' the ability and option to copy Greek/Hebrew as Unicode (instead of the legacy Helena/Yehudit font encodings), they did not add that ability to Amplify windows. It still always uses the legacy fonts. I had forgotten that.

Link to comment
Share on other sites

And many thanx - I never would have found that on my own.

D

Link to comment
Share on other sites

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
×
×
  • Create New...