syntactical databases

November 24, 2009

http://michaelsheiser.com/TheNakedBible/2009/11/the-status-quo-has-lost-its-status/

This is a serious critique of the charge of subjectivity with regard to syntactical databases. I'm not quite sure what to think of it. I was wondering if you guys might want to weigh in.

There is obviously a level of subjectivity in syntax. Neither during the "shootout" nor any of our demonstrations did any of our staff appeal to the subjectivity claim as a reason why we do not (currently) offer syntax databases of the GNT/MT. While Heiser believes that this is the cutting edge of Biblical language research, I can say that I only had one person ask about anything syntax related during the many demos I gave during SBL (and this one person was kind of an exception since she is particularly keen on this subject). This is not to say that it is unimportant, but that it represents a largely uncharted area in computer-assisted Biblical research (at least with relation to retail software). Furthermore, if we felt that it was subjective and unimportant we would not be developing these databases!

I could say more about Heiser's impressions on the shootout, and my own reflections, but that would stray beyond the topic of this thread. Stay tuned to the Accordance blog for a post on it from David.

Danny, sorry you couldn't make it out. I seriously doubt we will post anything online regarding this feature

November 24, 2009

I have read the post by Michael Heiser, too, and I have mixed feelings.

There are some points that he makes that I agree with. And there is something else that I am not impressed with.

I agree with his point that subjective choices are implied not only in syntactical databases, but also in morphological databases. And I also agree that an important limitation of morphological databases is that they are not aware of clause boundaries.

What I don't agree with is his representation of users of other software packages:

Despite the critical importance of syntax, since I

November 24, 2009

I wish to add something to the discussion about morphological vs. syntactical databases.

The blogger quoted above maintains that

The charge is that syntactical tagging is “subjective” since it gets into interpretive decisions.

He then proceeds to explain why:

First, identifying and labeling syntactical constructions are often not subjective exercises. There is nothing subjective about identifying (and tagging) a wayyiqtol followed by an expressed subject with a following accusative marker with noun. There are dozens of other such features marked in a syntax database that are not subjective. The construction is what it is, and is often crystal clear. So, in one respect, a syntax database does what a morphological database does when it identifies things. Morph databases identify words; syntax databases identify clusters of words.

In my previous post, I didn't deal with this first reason. I tend to agree with it, but I need to add that the same decisions about crystal clear constructions are often already included in morphological databases.

Bear with me, all you Hebrew scholars: I will provide some examples out of Greek texts. Greek morphology has many endings that are homographs, but require different tags. An -A ending may be Accusative Masculine Singular (ASM), or Accusative Neuter Plural (APN), or Nominative neuter plural (NPN). When morphological databases choose among the three, they make a decision which is not based on a word in itself, but rather on careful observation of a cluster of words. The clause syntax needs to be taken into account.

So, in this sense, there already is much syntax within morphological databases. Even if in some cases decisions are subjective, in most cases the reasons are crystal clear. When the syntactical reason are crystal clear, a failure to distinguish between AMS, ANP and NNP is a mistake. It is not a subjective choice, just a plain mistake.

I have tagged Greek and Latin texts for Accordance, and I know that I can make such mistakes. This is why I carefully check my work, to remove as many mistakes as possible. Some still remain, and I am ready to remove them as soon as some user finds them. So, I am not in a condition of being intolerant of such mistakes. Whenever they occur, it is because the person who tagged the text did not think enough of the syntactical context in which a word occurs.

Now, what I find hard to swallow is the notion that Logos is far ahead of competition because of its awareness of syntax. It is, I grant, inasmuch as a syntactical module is not available in Accordance right now.

It is not if we look back at tagged text that have already been published as module of both Accordance and Logos.

Consider the works of Philo in Greek as published by Logos. They offer a sample on the web.

http://www.logos.com/images/Screenshots/philogreek_fullscreen.png

Conveniently enough, it represents the start of the first work of Philo, On the creation of World. Until they take it down, I can offer a few comments. Let's looks at -A endings.

τὰ νομισθέντα

APN ASM

ASM is clearly wrong. It was tagged without taking syntax into account.

δίκαια

DSF

Not so. It's APN, for both syntactical and morphological reasons.

If required, I could move to the second paragraph and examine a few -ON endings, like ἄσκεπτον καὶ ἀταλαίπωρον, but I don't want to take the fun away from those who are expert in Greek, and I don't want to bother those who are more interested in Hebrew.

Why is Logos missing so badly on syntax? It might be easy to blame others: the database has been tagged by the Philo Concordance project. Now, the goal of that project was to assign to each word the tag which is statistically more frequent for that word form. If in most cases -A is APN, they automatically set all instances to APN. Even so, the database is very valuable. It has gone as far as one can go if tagging is automated, that is to say, if tagging decisions ignored the syntax.

Now, the Philo Concordance Project people want the users to be aware of this, and they say so on their webpage:

In this concordance the words were organised mechanically, on the basis of the Greek alphabetic order of the textforms ("tokens"). Two copies were printed in 1974, typed with Greek letters.

Later, some words were completely lemmatised and tagged in context and all words were automatically organised based upon this initial lemmatisation and tagging by Roald Skarsten.

When distributing this database, it is important for the user to be aware of what it includes and what it doesn't.

Later, the database was distributed by Accordance, BibleWorks and Logos. Actually, they name Logos first and add a picture. Then they mention Accordance (Mac) and Bibleworks. And they add:

These publishers also intend to complete the morphological analysis of the texts.

Doing that is hard work. I have personally helped to refine the database for Accordance. Is that work completely finished? No. Philo has more than 438,000 words, and, once TA has received the overall tag of APN, it is hard to catch all NPN and correct the tag. But we have made many thousands of corrections.

Also Logos informs its users:

Note: The morphological analysis contained in this edition is under revision. Forms that are ambiguous; particularly conjunctions, particles, pronouns and adverbs, are in the process of being revised. A rebuilt resource with the updated form of this information will be made available at no cost via download upon completion of the extended analysis.

Apart from conjunctions, particles, pronouns and adverbs, they should have added articles, nouns and verbs. I guess that interjections are all right.

So, they are honest. As I am not a Logos user, I don't know how many free upgrades they already distributed. Preparing an upgrade takes a lot of time. It is very hard work, as I said, and one that is not rewarding and easy to market. I find it as important to fix existing texts as moving to new projects. I am sure that scholars appreciate both careful revisions and new ideas.

I am aware of the efforts of Logos with syntax and I appreciate them. However, I find that there is no need to try to give the impression that only Logos is aware of the complexities of syntax, for that picture is not accurate.

[Edited: after checking, I corrected the number of words in Philo. I first entered the number by heart]

November 24, 2009

Thank you, Marco for your excellent explanations.

Here is my take, on a more general and popular level, to a couple of the issues Michael Heiser touched upon in his post:

http://www.bsreview.org/blog/2009/11/an-obsolete-competition.html

Best,

November 24, 2009

The Accordance handout for the Shootout session is now posted for download here.

syntactical databases

Recommended Posts

Rick Bennett

Link to comment

Share on other sites

Marco V. Fabbri

Link to comment

Share on other sites

Marco V. Fabbri

Link to comment

Share on other sites

Ruben Gomez

Link to comment

Share on other sites

Helen Brown

Link to comment

Share on other sites

Archived

Browse

Activity