>
|
|
|
| | Home | About | Undergraduate | Postgraduate | Staff | Research | News & Events | Contact Us | | ||
|
|
||
News and Events Menu |
UCREL Corpus Research Seminar: Prerequisites to a corpus-based analysis of 900 million words of Early Modern English textDate: 1 November 2012 Time: 02.00-03:00 pm Venue: George Fox LT3 UCREL Corpus Research Seminar Prerequisites to a corpus-based analysis of 900 million words of Early Modern English text Alistair Baron (Lancaster University) Early English Books Online (EEBO) contains digital facsimiles of virtually every English work printed between 1473 and 1700, some 125,000 publications. The Text Creation Partnership (TCP) are transcribing these documents to provide fully-searchable XML-encoded texts. So far, almost 40,000 texts have been transcribed, containing over 900 million words. This offers an unrivalled resource to corpus linguistic studies of the Early Modern English period, both in terms of scale and coverage. To prepare EEBO-TCP for use in CQPweb (a powerful web-based corpus query tool), metadata has been extracted to allow for analyses of particular sets of texts and to compare sub-corpora defined by metadata-based filtering. This talk will discuss the metadata extracted and examine the make-up of the texts transcribed to date. One particular issue with the computational analysis of historical texts is the large amount of spelling variation generally present. This has a detrimental effect on the accuracy of various corpus linguistic techniques (e.g. annotation and keyword analysis). The second part of this talk will introduce the Variant Detector (VARD2) tool, which can be used to manually and automatically insert modern standard equivalents alongside spelling variants, which has been shown to dramatically increase the accuracy of various methods used in corpus linguistics. The use of VARD2 to automatically normalise all 40,000 texts in the current version of EEBO-TCP will also be discussed. Event website: http://ucrel.lancs.ac.uk/crs Contact: Who can attend: Anyone
Further informationAssociated projects: CREME (Corpus Research in Early Modern English Organising departments and research centres: Computing and Communications, History, Linguistics and English Language, University Centre for Computer Corpus Research on Language (UCREL) |
|
| | Home | About | Undergraduate | Postgraduate | Staff | | Research | News and Events | Contact Us | |
||
| Department of History, Bowland College, Lancaster University,
LA1 4YT, UK | Tel: +44 (0) 1524 593155 Fax: +44 (0) 1524 846102 E-mail: history@lancaster.ac.uk Privacy and Cookies Notice |
||
| Save this page:
|
||