Corpora in the Teaching of Languages and Linguistics

Resources and practices in the teaching of languages and linguistics tend to reflect the division between the empirical and rationalist approaches. Many textbooks contain only invented examples and their descriptions are based upon intutition or second-hand accounts. Other books, however, are explicitly empirical and use examples and descriptions from corpora or other sources of real life language data.

Corpus examples are important in language learning as they expose students to the kinds of sentences that they will encounter when using the language in real life situations. Students who are taught with traditional syntax textbooks which contain sentences such as Steve puts his money in the bank are often unable to analyse more complex sentences such as The government has welcomed a report by an Australian royal commission on the effects of Britain's atomic bomb testing programme in the Australian desert in the fifties and early sixties (from the Spoken English Corpus).

Apart from being a source of empirical teaching data, corpora can be used to look critically at existing language teaching materials. Kennedy (1987a, 1987b) has looked at ways of expressing quantification and frequency in ESL (English as a second language) textbooks. Holmes (1988) has examined ways of expressing doubt and certainty in ESL textbooks, while Mindt (1992) has looked at future time expressions in German textbooks of English. These studies have similar methodologies - they analyse the relevant constructions or vocabularies, both in the sample text books and in standard English corpora and then they compare their findings between the two sets. Most studies found that there were considerable differences between what textbooks are teaching and how native speakers actually use language as evidenced in the corpora. Some textbook gloss over important aspects of usage, or foreground less frequent stylistic choices at the expense of more common ones. The general conclusion from these studies is that non-empirically based teaching materials can be misleading and that corpus studies should be used to inform the production of material so that the more common choices of usage are given more attention than those which are less common.

Read about language teaching for "special purposes" in Corpus Linguistics, Chapter 4, pages 104-105.

Corpora have also been used in the teaching of linguistics. Kirk (1994) requires his students to base their projects on corpus data which they must analyse in the light of a model such as Brown and Levinson's politeness theory or Grice's co-operative principle. In taking this approach, Kirk is using corpora not only as a way of teaching students about variation in English but also to introduce them to the main features of a corpus-based approach to linguistic analysis.

A further application of corpora in this field is their role in computer-assisted language learning. Recent work at Lancaster University has looked at the role of corpus-based computer software for teaching undergraduates the rudiments of grammatical analysis (McEnery and Wilson 1993). This software - Cytor - reads in an annotated corpus (either part-of-speech tagged or parsed) one sentence at a time, hides the annotation and asks the student to annotate the sentence him- or herself. Students can call up help in the form of the list of tag mnemomics, a frequency lexicon or concordances of examples. McEnery, Baker and Wilson (1995) carried out an experiment over the course of a term to determine how effective Cytor was at teaching part-of-speech learning by comparing two groups of students - one who were taught with Cytor, and another who were taught via traditional lecturer-based methods. In general the computer-taught students performed better than the human-taught students throughout the term.