Part-of-speech Annotation.

This is the most basic type of linguistic corpus annotation - the aim being to assign to each lexical unit in the text a code indicating its part of speech. Part-of-speech annotation is useful because it increases the specificity of data retrieval from corpora, and also forms an esential foundation for further forms of analysis (such as syntactic parsing and semantic field annotation). Part-of-speech annotation also allows us to distinguish between homographs.

Click here for an example of part-of-speech annotation.

Part-of-speech annotation was one of the first types of annotation to be formed on corpora and is the most common today. One reason for this is because it is a task that can be carried out to a high degree of accuracy by a computer. Greene and Rubin (1971) achieved a 71% accuracy rate of correctly tagged words with their early part-of-speech tagging program (TAGGIT). In the early 1980s the UCREL team at Lancaster University reported a success rate of 95% using their program CLAWS.

Read about idiomatic tags and the tagging of contracted forms in Corpus Linguistics, chapter 2, pages 40-42.


Lemmatisation | Parsing | Semantics
Discoursal and text linguistic annotation | Phonetic transcription
Prosody | Problem-oriented tagging