I wish to add something to the discussion about morphological vs. syntactical databases.
The blogger quoted above maintains that
The charge is that syntactical tagging is “subjective” since it gets into interpretive decisions.
He then proceeds to explain why:
First, identifying and labeling syntactical constructions are often not subjective exercises. There is nothing subjective about identifying (and tagging) a wayyiqtol followed by an expressed subject with a following accusative marker with noun. There are dozens of other such features marked in a syntax database that are not subjective. The construction is what it is, and is often crystal clear. So, in one respect, a syntax database does what a morphological database does when it identifies things. Morph databases identify words; syntax databases identify clusters of words.
In my previous post, I didn't deal with this first reason
. I tend to agree with it, but I need to add that the same decisions about crystal clear constructions are often already included in morphological databases.
Bear with me, all you Hebrew scholars: I will provide some examples out of Greek texts. Greek morphology has many endings that are homographs, but require different tags. An -A ending may be Accusative Masculine Singular (ASM), or Accusative Neuter Plural (APN), or Nominative neuter plural (NPN). When morphological databases choose among the three, they make a decision which is not based on a word in itself, but rather on careful observation of a cluster of words. The clause syntax needs to be taken into account.
So, in this sense, there already is much syntax within morphological databases. Even if in some cases decisions are subjective, in most cases the reasons are crystal clear. When the syntactical reason are crystal clear, a failure to distinguish between AMS, ANP and NNP is a mistake. It is not a subjective choice, just a plain mistake.
I have tagged Greek and Latin texts for Accordance, and I know that I can make such mistakes. This is why I carefully check my work, to remove as many mistakes as possible. Some still remain, and I am ready to remove them as soon as some user finds them. So, I am not in a condition of being intolerant of such mistakes. Whenever they occur, it is because the person who tagged the text did not think enough of the syntactical context in which a word occurs.
Now, what I find hard to swallow is the notion that Logos is far ahead of competition because of its awareness of syntax. It is, I grant, inasmuch as a syntactical module is not available in Accordance right now.
It is not if we look back at tagged text that have already been published as module of both Accordance and Logos.
Consider the works of Philo in Greek as published by Logos. They offer a sample on the web.
Conveniently enough, it represents the start of the first work of Philo, On the creation of World. Until they take it down, I can offer a few comments. Let's looks at -A endings.
ASM is clearly wrong. It was tagged without taking syntax into account.
Not so. It's APN, for both syntactical and morphological reasons.
If required, I could move to the second paragraph and examine a few -ON endings, like ἄσκεπτον καὶ ἀταλαίπωρον, but I don't want to take the fun away from those who are expert in Greek, and I don't want to bother those who are more interested in Hebrew.
Why is Logos missing so badly on syntax? It might be easy to blame others: the database has been tagged by the Philo Concordance project. Now, the goal of that project was to assign to each word the tag which is statistically more frequent for that word form. If in most cases -A is APN, they automatically set all instances to APN. Even so, the database is very valuable. It has gone as far as one can go if tagging is automated, that is to say, if tagging decisions ignored the syntax.
Now, the Philo Concordance Project people want the users to be aware of this, and they say so on their webpage
In this concordance the words were organised mechanically, on the basis of the Greek alphabetic order of the textforms ("tokens"). Two copies were printed in 1974, typed with Greek letters.
Later, some words were completely lemmatised and tagged in context and all words were automatically organised based upon this initial lemmatisation and tagging by Roald Skarsten.
When distributing this database, it is important for the user to be aware of what it includes and what it doesn't.
Later, the database was distributed by Accordance, BibleWorks and Logos. Actually, they name Logos first and add a picture. Then they mention Accordance (Mac) and Bibleworks. And they add:
These publishers also intend to complete the morphological analysis of the texts.
Doing that is hard work. I have personally helped to refine the database for Accordance. Is that work completely finished? No. Philo has more than 438,000 words, and, once TA has received the overall tag of APN, it is hard to catch all NPN and correct the tag. But we have made many thousands of corrections.
informs its users:
Note: The morphological analysis contained in this edition is under revision. Forms that are ambiguous; particularly conjunctions, particles, pronouns and adverbs, are in the process of being revised. A rebuilt resource with the updated form of this information will be made available at no cost via download upon completion of the extended analysis.
Apart from conjunctions, particles, pronouns and adverbs, they should have added articles, nouns and verbs. I guess that interjections are all right.
So, they are honest. As I am not a Logos user, I don't know how many free upgrades they already distributed. Preparing an upgrade takes a lot of time. It is very hard work, as I said, and one that is not rewarding and easy to market. I find it as important to fix existing texts as moving to new projects. I am sure that scholars appreciate both careful revisions and new ideas.
I am aware of the efforts of Logos with syntax and I appreciate them. However, I find that there is no need to try to give the impression that only Logos is aware of the complexities of syntax, for that picture is not accurate.
[Edited: after checking, I corrected the number of words in Philo. I first entered the number by heart]
Edited by Marco, 25 November 2009 - 10:42 AM.
minor edits by permission