Leech's Maxims of Annotation

1. It should be possible to remove the annotation from an annotated corpus in order to revert to the raw corpus. At times this can be a simple process - for example removing every character after an underscore e.g. "Claire_NP1 collects_VVZ shoes_NN2" would become "Claire collects shoes". However, the prosodic annotation of the London-Lund corpus is interspersed within words - for example "g/oing" indicates a rising pitch on the first syllable of the word "going", meaning that the original words cannot be so easily reconstructed.

2. It should be possible to extract the annotations by themselves from the text. This is the flip side of maxim 1. Taking points 1 and 2 together, the annotated corpus shuld allow the maximim flexibility for manipulation by the user.

3. The annoatation scheme should be based on guidelines which are available to the end user. Most corpora have a manual which contains full details of the annotation scheme and guidelines issued to the annotators. This enables the user to understand fully what each instance of annotation represents without resorting to guesswork, and to understand in cases of ambiguity why a particular annotation decision was made at that point. You might want to briefly look at an example of the guidelines for part-of speech annotation of the BNC corpus although this page has restricted access.

4. It should be made clear how and by whom the annotation was carried out. A corpus may be annotated manually, either by a single person or by a number of different people; alternatively the annotation may be carried out automatically by a computer program whose output may or may not be corrected by human beings.

5. The end user should be made aware that the corpus annotation is not infallible, but simply a potentially useful tool. Any act of corpus annotation is, by defintion also an act of interpretation, either of the stucture of the text or of its content.

6. Annotation schemes should be based as far as possible on widely agreed and theory-neutral principles. For example, parsed corpora often adopt a basic context-free phrase structure grammar rather than implementing a narrower specific grammatical theory such as Chomsky's Principals and Parameters framework.

7. No annotation scheme has the a priori right to be considered as a standard. Standards emerge through practical consensus.


Back to previous page.