Richard Xiao

BA (Soochow University), MA (Hohai University & Nanjing Normal University), PhD (Lancaster University)

Home        Education/Work        Research        Publications        LCMC        Contact

horizontal rule

Department of LAEL

Lancaster UCREL

My official webpage

My research portal

Corpus-based Language Studies

Corplang index

UCCTS3 conference

UCCTS2 conference

UCCTS1 conference

Well-known and influential corpora

Corpus creation

Corpus4u Community

The Lancaster Corpus of Mandarin Chinese

(LCMC)

by

Tony McEnery

Richard Xiao

Lancaster University

 

Preface

The Lancaster Corpus of Mandarin Chinese (LCMC) is created to cater for the increasing need of the research community for a publicly available balanced corpus of Mandarin Chinese. LCMC came into being as part of a research project undertaken by the Linguistics Department Lancaster University, funded by the UK Economic and Social Research Council (Grant Ref. RES-000-220135). As the corpus is designed as a Chinese match for the Freiburg-LOB Corpus of British English (FLOB), it will provide a valuable resource for contrastive studies between English and Chinese as well as a sound basis for monolingual investigations of Chinese.

We are obliged to the ESRC for funding our project, without which this corpus would not have been built. We would also like to thank the presses, libraries and websites, as listed in the bibliographic document of this corpus, for providing the required texts. Our special thanks go to Miss Xin Huang who has been involved in the tedious process of proofreading the scanned electronic texts.

 

June 2003

 

Contents

1. Basic information of the corpus

    1. Aims

    2. Sampling frame and text collection

    3. Encoding and markup conventions

2. List of codes

3. List of text categories

4. The LCMC tagset

5. Getting started: using Xara to explore the corpus

6. Copyright information (character, Pinyin)

7. License

8. Availability