The Lancaster Corpus of Mandarin Chinese
The Lancaster Corpus of Mandarin Chinese (LCMC) is created to cater for the increasing need of the research community for a publicly available balanced corpus of Mandarin Chinese. LCMC came into being as part of a research project undertaken by the Linguistics Department Lancaster University, funded by the UK Economic and Social Research Council (Grant Ref. RES-000-220135). As the corpus is designed as a Chinese match for the Freiburg-LOB Corpus of British English (FLOB), it will provide a valuable resource for contrastive studies between English and Chinese as well as a sound basis for monolingual investigations of Chinese.
We are obliged to the ESRC for funding our project, without which this corpus would not have been built. We would also like to thank the presses, libraries and websites, as listed in the bibliographic document of this corpus, for providing the required texts. Our special thanks go to Miss Xin Huang who has been involved in the tedious process of proofreading the scanned electronic texts.
2. Sampling frame and text collection
3. Encoding and markup conventions