>
Department of History Lancaster University
Search Site
You are here: Home >

UCREL Corpus Research Seminar: Prerequisites to a corpus-based analysis of 900 million words of Early Modern English text

Date: 1 November 2012 Time: 02.00-03:00 pm

Venue: George Fox LT3

UCREL Corpus Research Seminar

Prerequisites to a corpus-based analysis of 900 million words of Early Modern English text

Alistair Baron (Lancaster University)

Early English Books Online (EEBO) contains digital facsimiles of virtually every English work printed between 1473 and 1700, some 125,000 publications. The Text Creation Partnership (TCP) are transcribing these documents to provide fully-searchable XML-encoded texts. So far, almost 40,000 texts have been transcribed, containing over 900 million words. This offers an unrivalled resource to corpus linguistic studies of the Early Modern English period, both in terms of scale and coverage. To prepare EEBO-TCP for use in CQPweb (a powerful web-based corpus query tool), metadata has been extracted to allow for analyses of particular sets of texts and to compare sub-corpora defined by metadata-based filtering. This talk will discuss the metadata extracted and examine the make-up of the texts transcribed to date.

One particular issue with the computational analysis of historical texts is the large amount of spelling variation generally present. This has a detrimental effect on the accuracy of various corpus linguistic techniques (e.g. annotation and keyword analysis). The second part of this talk will introduce the Variant Detector (VARD2) tool, which can be used to manually and automatically insert modern standard equivalents alongside spelling variants, which has been shown to dramatically increase the accuracy of various methods used in corpus linguistics. The use of VARD2 to automatically normalise all 40,000 texts in the current version of EEBO-TCP will also be discussed.

Event website: http://ucrel.lancs.ac.uk/crs

Contact:

Who can attend: Anyone

 

Further information

Associated projects: CREME (Corpus Research in Early Modern English

Organising departments and research centres: Computing and Communications, History, Linguistics and English Language, University Centre for Computer Corpus Research on Language (UCREL)

«Back

Faculty of Arts and Social Sciences

| Home | About | Undergraduate | Postgraduate | Staff |
| Research | News and Events | Contact Us |
Department of History, Bowland College, Lancaster University, LA1 4YT, UK | Tel: +44 (0) 1524 593155 Fax: +44 (0) 1524 846102 E-mail: history@lancaster.ac.uk
Privacy and Cookies Notice

Save this page: delicious logo Del.icio.us Digg It Reddit Reddit Facebook Stumble It Stumble It!