Jump to content

Odd search results in a User Tool


Lorinda H. M. Hoover

Recommended Posts

I have a User Tool of John Wesley's Sermons on Several Occasions, that I presume I got from the Exchange, although can't find it there now.

 

I searched for "Preven*" (without the quotes). In addition to legitimate hits, I got a batch of cases where other words like "the" "all" "thy" etc. show up as hits. But not all such words are marked as hits.

 

Is this just an oddity of a (presumably old) User Tool, or some obscure bug in Accordance itself?

 

this is on my Windows Vista machine. I won't have access to my Mac until tonight. I'm getting an I/O Error #-43 on my iPad for that tool, so I can't check there.

 

Lorinda

Link to comment
Share on other sites

I tried this on one of my user tools and got some false hits as well.

 

There were some where the word 'prevent' was not highlighted but the next word was although other times a wrong word was highlighted without the word preven* being in the paragraph.

 

Also tried it with an exact search on 'prevent' and that also got the odd false hit.

Edited by Steve King
Link to comment
Share on other sites

I had missed that the false hit are within several words of an actual--but not highlighted hit. Interesting. I can't reproduce this in the other User Tools I've tried.

 

I forgot to specify that I am searching the contents.

 

Also, searches using AND don't seem to work in this tool (i.e. "preven* <AND> grace")

Link to comment
Share on other sites

I can reproduce it in some tools I have. Usually the larger ones. There is one where it highlights whole paragraphs and almost every hit is wrong. But that is not the norm.

 

I wonder if there are special characters within the tool which are not being displayed that are causing problems.

Link to comment
Share on other sites

I have managed to fix it. To do so, I edited each section where there the wrong word was highlighted: I would click near the wrong word, hit command-U (working on my mac) to open the edit window, make a simple change (doesn't matter what) and save. Immediately that hit would be correctly highlighted. Then I went to the next wrong one.

 

In the process, I discovered that the file is quite "dirty" Lots of html code left in (like <i>xxxx</i> although some italics came through fine), title markings in the wrong places, odd code around most Scripture links (probably the syntax used for links from whatever site it was taken from), etc. It would really be better to start over in a text editor and regex it repeatedly. Might be an interesting project if I can find the time.

Link to comment
Share on other sites

Hi Lorinda,

 

I have a prototype tool I'm working on to clean up HTML removing or substituting tags based on a configuration file you give it along with the HTML file. It's designed to make cleanup of source files easy so that you don't end up with bad imports. It saves you from having to do all the regexing directly yourself - or at least that's the theory. If you're interested I could try the file out in the tool and see if it's really performing the function it's designed to.

 

Thx

D

Link to comment
Share on other sites

Daniel, that's very interesting. I don't have the original HTML file (assuming that's how this tool got it's start). CCEL provides a plain text version for download, but (to my disappointment) not an rtf, html or xml version that would preserve bold and italic formatting--not to mention headings, etc.

 

I can export the current tool as rtf, apparently, but I don't know if that would work with your tool.

 

Lorinda

Link to comment
Share on other sites

Hi Daniel

 

That sounds interesting. I think I would have HTML files of mine which you could test out. I am on holiday in France at the moment so do not have access to them but I could send an example to you when I get back.

 

Steve

Link to comment
Share on other sites

@Lorinda : CCEL .... ok. I assumed you'd created the tool. If not then I don't know - certainly RTF won't help my tool as its not designed to work with that. You can only get HTML from them by saving the website pages but strictly they aren't mad keen on that. If its a text file I'd have to look at the case in point and see if there is dodgey stuff in the source. Their original languages stuff in the txt files for example is not readable, at least not in the case I was looking at with Abram.

 

@Steve : That would be interesting. I'm on vacation too - wouldn't be able to check it out for a week or so anyhow. So whenever you are ready. Thx.

 

Thx

D

Link to comment
Share on other sites

I know it's a real kludge but you can read the rtf with say Word, then save that as messy html. The use BBEdit/TW to clean up the html and you should then be close to a useable html file for import.

Link to comment
Share on other sites

Daniel,

 

I'm honestly not certain of the original provenance of the User Tool. I've had it for several years. Maybe I got it from the exchange, maybe I downloaded an html file and imported it myself. It looks as though ccel used to provide html formats of many of the files, but no longer does. (This is based on some google searches and now broken links those searches led me to).

 

Ken,

 

I've considered doing exactly what you suggested, although using NeoOffice rather than Word for the html conversion. I don't have time to do so at the moment, but I think it would be better than downloading a plain text file from ccel and trying to reformat everything. And it would give me a chance to learn more about regex.

Link to comment
Share on other sites

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
×
×
  • Create New...