Jump to content

Possible Search Bug?


Randy Steffens Jr

Recommended Posts

I am having a confusing result when using this search argument: * <WITHIN 1 Words> [FIELD BEGIN]

 

I am trying to use this search to count all the verses in the Bible. When I run it in the KJV, I see there are 31,218 verses found (the correct number of verses in the KJV). However there are 62,436 hits listed. That's exactly double the number of verses found. It seems to me that the hits should be the same as the number of verses. Could this be a bug? Or am I not understanding the search correctly?

 

Thanks!

Randy Steffens

Link to comment
Share on other sites

Is it by any chance counting the last word of the preceding verse as within 1 of the beginning of the next verse ? 62436 is twice 31218.

 

Thx

D

Link to comment
Share on other sites

Thanks! Those preceeding words are not highlighted as hits in my search results... I'm not sure what to think.

Link to comment
Share on other sites

Ok, I've run your test now and I see your point. The analysis tab only reports the 31218 and does not mention the double. I tried within 2 and expected that it would produce 124872 but it produce 93421 which isn't 93654 either so I'm not sure what its doing but it seems to just be in the hit total not in the analysis page.

 

OK worked that puzzle out. There are 31218 hits within 1 of the beginning and an additional 30985 at a distance of 2 words from the beginning. That plus the 31218 (number of verses) is 93421. OK so it appears to be adding the number of verses to the total hit count - no idea why that would be.

 

BTW, * and selecting verses will get you the count also.

 

Thx
D

Edited by Daniel Semler
Link to comment
Share on other sites

Yeah, it's strange. I think we've uncovered some sort of bug!

 

Randy

Link to comment
Share on other sites

When you include items with an command, it counts each half as a hit. So, the double you are getting is for the * and for the [FIELD Begin].

 

But, if you are trying to count the number of verses, why not just run any search (or just display all verses) and look at the "Verse x of x" text?

Link to comment
Share on other sites

Thanks Joel,

 

Yeah, I realized before posting here that the easiest way would be to look at the "verse x of x" text area, but I still wanted to understand why this search behaved as it did.

 

I seems to me that when <AND> is used with the [FIELD ?] command in this way, that Accordance should be programmed to handle the hit results differently then it usually does. Otherwise, the result is counterintuitive. In this case, the hit count does not appear to agree with either the hit list, or the analysis pane -- and the result is very confusing.

 

Maybe something to consider for your next update?

 

Thanks!

Randy

Link to comment
Share on other sites

Hey Joel, thanx for the explanation

 

I ran a quick dummy query that produces very few results to examine this idea. I ran God <within 1 words> love as a flex search. It returned the info attached below. 9 hits of the query result satisfying God being present within 1 word of love. Now bear in mind that I come from a SQL database background. I would consider this 9 hits that satisfied the complete query. The user is after all returned only results that satisfy the complete query. If I was to search for God or for love alone I would find 4714 and 454 flex hits respectively. If I search for God <AND> love with scope set to verse I get 219 flex hits, but in fact that is really a count of each part of the query; God yielding 115 word occurrences where love is also present in the same verse, and love yielding 104 word occurrences where God is present in the same verse. The actual number of versus containing both is 89. Of course there is the complication that in some verses one of the words occurs more than once.

 

I've noticed this issue before in that <AND> functions as a logical AND would in SQL but it is not reported in that way entirely. In case like these its easy in the analysis to see why.

 

In the case of [FIELD B/E] it is not because the contribution of [FIELD B/E] is not presented in the analysis.

 

I realise the Accordance query language is not SQL and that its hit count behaviour is well ingrained in the minds of most Acc users. But I tend to agree with Randy that reporting the total hits as the sum of all hit pieces is not intuitive - perhaps its just us :) I would regard that information as drill down detail. But if the chosen way to report it is this way, then I would advocate augmenting the analysis details so that one can see all the contributing elements, ie. included FIELD counts. Its all the more strange because one cannot run [FIELD B] as a query on its own and get the results that search is adding to the totals.

 

Thx

D

  • Like 1
Link to comment
Share on other sites

Hi Daniel,

 

If you go down to the end of the analysis you do get the [FIELD B/E] mentioned with the number of hits against it. I noticed it last night but by then Joel had already replied.

 

I must admit that intuitively I do not expect there to be hits against the Field command but it is consistent when you compare it with trying to find something like "love <WITHIN 10 WORDS> patient".

Link to comment
Share on other sites

I can't find the where in the Analysis pane the specific hits for [FIELD ?] can be located?

 

Blessings,

Randy

Link to comment
Share on other sites

When you include items with an <AND> command, it counts each half as a hit. So, the double you are getting is for the * and for the [FIELD Begin].

 

But, if you are trying to count the number of verses, why not just run any search (or just display all verses) and look at the "Verse x of x" text?

 

Hi Joel, I just noticed that you mentioned the <AND> command in your post above. Actually, the search I had questions about was: * <WITHIN 1 Words> [FIELD BEGIN] which didn't include the <AND> command.

 

Does the <WITHIN ? Words> command function the same as the <AND> command in this respect? Either way, it seems like when either of these commands are linked to the [FIELD ?] command, the hit number should be reported differently, so that it agrees with the red-highlited words in the hit list. Otherwise the hit number is confusing and doesn't make sense.

 

Blessings!

Randy

Link to comment
Share on other sites

Hey Joel, thanx for the explanation

 

I ran a quick dummy query that produces very few results to examine this idea. I ran God <within 1 words> love as a flex search. It returned the info attached below. 9 hits of the query result satisfying God being present within 1 word of love. Now bear in mind that I come from a SQL database background. I would consider this 9 hits that satisfied the complete query. The user is after all returned only results that satisfy the complete query. If I was to search for God or for love alone I would find 4714 and 454 flex hits respectively. If I search for God <AND> love with scope set to verse I get 219 flex hits, but in fact that is really a count of each part of the query; God yielding 115 word occurrences where love is also present in the same verse, and love yielding 104 word occurrences where God is present in the same verse. The actual number of versus containing both is 89. Of course there is the complication that in some verses one of the words occurs more than once.

 

I've noticed this issue before in that <AND> functions as a logical AND would in SQL but it is not reported in that way entirely. In case like these its easy in the analysis to see why.

 

In the case of [FIELD B/E] it is not because the contribution of [FIELD B/E] is not presented in the analysis.

 

I realise the Accordance query language is not SQL and that its hit count behaviour is well ingrained in the minds of most Acc users. But I tend to agree with Randy that reporting the total hits as the sum of all hit pieces is not intuitive - perhaps its just us :) I would regard that information as drill down detail. But if the chosen way to report it is this way, then I would advocate augmenting the analysis details so that one can see all the contributing elements, ie. included FIELD counts. Its all the more strange because one cannot run [FIELD B] as a query on its own and get the results that search is adding to the totals.

 

Thx

D

Good points Daniel. I would agree with this. Perhaps there needs to be a re-think about how hits numbers of more complex searches are reported in Accordance.

 

Blessings,

Randy

Edited by Randy Steffens Jr
Link to comment
Share on other sites

I believe you all are confusing two different issues here a little bit.

 

Firstly, there are very specific reasons why the hits are counted as they are. We decided a long time ago (and would still agree) that the # of hits should represent how many found words, not found 'instances'. Consider some of the differences:

 

1) The total number of 'units' found, i.e. the number of verses is already expressed in the "Verse x of x" text. So, it would be redundant to have the number of hits represent how many different units were found.

2) The number of hit 'matches' is a bit ambiguous. Consider a verse that contains God twice and love once, is this counted as once or twice? Or what about a verse that contains God twice and you twice? In a sense, you could argue this verse has anywhere from 1 to 4 matches, depending on how you wanted to count it. This is ambiguity, which isn't good for statistical research.

3) The number of hit words. This is what we are doing, and provides a good, exact number. The number can be broken down very clearly and accurately, but provides us different information from the number of verses.

 

 

Now, I believe it is a separate issue why the [FIELD] command is counted as a hit word, since as you all have pointed out, it doesn't show up in the search text. This is, frankly, more of a legacy decision, since [FIELD] represents a fictitious word at the beginning or end of the verse. So, it naturally was counted :) We've had this in place for at least 12 years, and nobody seems to have been bothered by it being counted as a hit until now. We'll look into excluding [FIELD] from the results, as we do agree it is a bit strange to mark it as a hit when it doesn't exist, and it isn't even counted in the Analysis.

 

One final note, don't forget about your friendly simple construct! You can do so much better searching there (if necessary), including the Place command, that lets you specify the location of a word based on its number in the verse, rather than proximity to a bounds. If you note, a search like this doesn't add the 'spurious' hits:Screen shot 2013-07-08 at 7.24.34 AM.png

Link to comment
Share on other sites

 

I can't find the where in the Analysis pane the specific hits for [FIELD ?] can be located?

 

Blessings,

Randy

Hi Randy,

if you scroll to the end of the analysis pane you should see this...

 

post-29509-0-65139000-1373286542_thumb.jpg

Link to comment
Share on other sites

Hi Ken,

 

I just tried this with the two following constructions and in neither case could I see [FIELD BEGIN] in the Analysis tab.

 

* <WITHIN 1 words> <AND> [field b]

* <WITHIN 1 words> [field b]

 

Is there something I am not setting in the display customization to see this ?

Just to clarify I'm using 10.1.7 on the OSX and I'm, selecting Analysis under the Word Count Totals subheading.

 

Hi Joel, I considered point 1 you make that the counts would be redundant, and point 2 as I mentioned in my note. I didn't mention it in my post but I did wonder if hits ought to be renamed to hit words or some such thing but decided not to mention it. Changing any of this after as you say 12 years in the field is apt to cause more trouble than it solves. That is why I went after the idea of getting FIELD B/E in the result breakdown. Now Ken shows that's possible which I did not know and I cannot yet get to work correctly. Once that's sorted out I can then reconcile the counts with the analysis and see what's going on. At that point I'd be done because the results would be intelligible in themselves rather than posing a puzzle.

 

Thx

D

Link to comment
Share on other sites

Hi Daniel,

 

I'm not having a problem with this - try searching in the ESV for "love <WITHIN 1 Words> [FIELD b]" you should get the following:

 

post-30494-0-60089100-1373293418_thumb.png

 

 

The following are the settings I have using cmd-t from the analysis pane

 

 

post-30494-0-74787200-1373293756_thumb.png

 

Note that you do lose the "Field Begin" title if you change the cmd-t settings after running the search - is this a bug?

Edited by Steve King
Link to comment
Share on other sites

Ah ha !! Thank you. The analysis tab entry for [FIELD B] appears when I do an exact search. It does not appear for Flex.

It would be nice for this to be consistent for both search types unless there is some reason that cannot be so.

 

Thx

D

Link to comment
Share on other sites

The line exists but it just says "(n total words)" and rather than "[FIELD BEGIN] (n total words)". This seems to be a bug to me because the same thing happens if you change the analysis pane preferences - it drops the title. See screen shot for the flex search

 

post-30494-0-15650900-1373294585_thumb.png

Edited by Steve King
Link to comment
Share on other sites

Yeah I noticed that line but I don't know what its from but your theory makes sense. It ceratinly seems like a bug.

 

Thx

D

Link to comment
Share on other sites

Now this I do confirm as a bug, and we'll get that fixed asap!

Link to comment
Share on other sites

Many thanx Joel.

D

Link to comment
Share on other sites

This is good! But I would still like to see [FIELD] data not counted as hits. At least not where hits are normally displayed. It still doesn't show up in search text and it seems strange to include it as a hit.

 

Thanks Joel!

 

Blessings!

Randy

Link to comment
Share on other sites

I see that the [FIELD] data are now reported in the Analysis window! Thanks!!! I like this! You guys are amazing.

 

But I still think it strange to see [FIELD] data reported as part of the standard hit number, since nothing is highlighted in the search results that corresponds to this. Such a hit result doesn't seem intuitive. It would be better if this information were available in the Analysis window upon demand, but not as part of the main hit number.

 

Randy

Link to comment
Share on other sites

Hi Randy,

 

I just tried this with a flex search on 10.2 and it didn't work for me. My analysis for my "love <WITHIN 1 words> <AND> [field b]" search used in earlier posts is like this :

 

Total number of verses = 5
(total number of verses displayed = 5)

love [Flex] (5 total words)

Love = 4
G0025 agapao ἀγαπάω = 1
G0026 agape ἀγάπη = 1
H0157 ’ahab, ’aheb אָהַב, אָהֵב = 2
Lover = 1
H0157 ’ahab, ’aheb אָהַב, אָהֵב = 1


[Flex] (5 total words)

Did you do an exact search by any chance ?

 

Thx

D

Link to comment
Share on other sites

No I did a flex search. The results, it appears are dependent on whether you include the <AND> in the search argument. Without the <AND> like:

 

 

love <WITHIN 1 words> [FIELD BEGIN]

 

You should get this in the Analysis window:

---------------------------------------------------

 

Total number of verses = 5

(total number of verses displayed = 5)

 

love [Flex] (5 total words)

 

Love = 4

G0025 agapao ἀγαπάω = 1

G0026 agape ἀγάπη = 1

H0157 ’ahab, ’aheb אָהַב, אָהֵב = 2

Lover = 1

H0157 ’ahab, ’aheb אָהַב, אָהֵב = 1

 

 

[FIELD BEGIN] [Flex] (5 total words)

----------------------------------------

 

With the <AND> in there, I get the results you specified. Looks like a bug.

 

Randy

 

 

Edited by Randy Steffens Jr
Link to comment
Share on other sites

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
×
×
  • Create New...