Andrew Hardie

Andrew Hardie: Publications

Title links either go to an online version of the text, or to further publication details if the text is not openly available online.

Books

Culpeper, J., Hardie, A., & Demmen, J. (2023) The Arden Encyclopedia of Shakespeare’s Language: Dictionary A-M. Bloomsbury.

Culpeper, J., Hardie, A., & Demmen, J. (2023) The Arden Encyclopedia of Shakespeare’s Language: Dictionary N-Z. Bloomsbury.

McEnery, T, Hardie, A and Younis, N (eds) (2019) Arabic Corpus Linguistics. Edinburgh University Press.

Semino, E, Demjén, Z, Hardie, A, Payne, S and Rayson, P. (2018) Metaphor, Cancer, and the End of Life: A corpus-based study. Routledge

McEnery, T and Hardie, A (2012) Corpus Linguistics: Method, Theory and Practice. Cambridge University Press.

Baker, P, Hardie A and McEnery, T (2006) A Glossary of Corpus Linguistics. Edinburgh University Press.

Journal articles

Hughes, J and Hardie, A (forthcoming) The psychological reality of a linguistic-statistical construct: Observation of collocational and non-collocational language processing using event-related brain potentials.

Hughes, J and Hardie, A (forthcoming) The effect of collocational strength on the speed of processing of adjective-noun bigrams in native speakers and learners of English: Evidence from a self-paced reading experiment.

Hardie, A (forthcoming) A dual sort-and-filter strategy for statistical analysis of collocation, keywords, and lockwords.

Hardie, A. and Daraselia, S. (forthcoming) A theory for words in Georgian: traditional constructs versus corpus annotation.

Jehangir, H. and Hardie, A. (forthcoming) Design and construction of an openly available Urdu web corpus.

Gillings, M and Hardie, A (2023) The interpretation of topic models for scholarly analysis: an evaluation and critique of current practice. Digital Scholarship in the Humanities 38(2): 530–543. https://doi.org/10.1093/llc/fqac075

Collins, L. and Hardie, A. (2022) Making use of transcription data from qualitative research within a corpus-linguistic paradigm: Issues, experiences, and recommendations. Corpora 17(1): 123-135. https://doi.org/10.3366/cor.2022.0237

Hardie, A. and Ibrahim, W. (2021) Exploring and classifying the Arabic copula and auxiliary kāna via enhanced part-of-speech tagging. Corpora 16(3): 305-335. https://doi.org/10.3366/cor.2021.0225

Culpeper, J., Hardie, A., Demmen, J., Hughes, J. and Timperley, M. (2021) Supporting the corpus-based study of Shakespeare’s language: Enhancing a corpus of the First Folio. ICAME Journal 45(1): 37-86. https://doi.org/10.2478/icame-2021-0002

Collins, L., Semino, E., Demjén, Z., Hardie, A., Mosely, P., Woods, A. and Alderson-Day, B. (2020) A linguistic approach to the psychosis continuum: (dis)similarities and (dis)continuities in how clinical and non-clinical voice-hearers talk about their voices. Cognitive Neuropsychiatry 25(6): 447-465. https://doi.org/10.1080/13546805.2020.1842727

Hu, X., Xiao, Z. and Hardie, A. (2020) 翻译英语变体的语料库文体统计学分析 [A corpus-based multi-feature stylo-statistical analysis of translational English]. 外语教学与研究 [Foreign Language Teaching and Research] 52(2): 273-282.

Hardie, A and Dorst, I van (2020) A survey of grammatical variability in Early Modern English drama. Language and Literature 29(3): 275-301. https://doi.org/10.1177/0963947020949440

Blything, L, Hardie, A and Cain, K (2020) Question asking during reading comprehension instruction: a corpus study of how question type influences the linguistic complexity of primary school students’ responses. Reading Research Quarterly 55(3): 443-472. https://doi.org/10.1002/rrq.279

Hu, X, Xiao, R and Hardie, A (2019) How do English translations differ from native English writings? A multi-feature statistical model for linguistic variation analysis. Corpus Linguistics and Linguistic Theory 15(2): 347-382. https://doi.org/10.1515/cllt-2014-0047

Love, R., Brezina, V., McEnery, T., Hawtin, A., Hardie, A. and Dembry, C. (2019) Functional variation in the Spoken BNC2014 and the potential for register analysis. Register Studies 1(2): 296-317. https://doi.org/10.1075/rs.18013.lov

Love, R, Dembry, C, Hardie, A, Brezina, V and McEnery, T (2017) The Spoken BNC2014: designing and building a spoken corpus of everyday conversations. International Journal of Corpus Linguistics 22(3): 319-344. https://doi.org/10.1075/ijcl.22.3.02lov

Semino, E, Demjén, Z, Demmen, J, Koller, V, Payne, S, Hardie, A, Rayson, P (2017) The online use of ‘Violence’ and ‘Journey’ metaphors by cancer patients, as compared with health professionals: a mixed methods study. BMJ Supportive & Palliative Care 2017(7): 60-66.

https://doi.org/10.1136/bmjspcare-2014-000785

Gregory, I, Atkinson, A, Hardie, A, Joulain-Jay, A, Kershaw, D, Porter, C, Rayson, P and Rupp, CJ (2016) From digital resources to historical scholarship with the British Library 19th Century Newspaper Collection. Journal of Siberian Federal University: Humanities and Social Sciences 9(4): 994-1006. https://doi.org/10.17516/1997-1370-2016-9-4-994-1006

Demmen, J, Semino, E, Demjén, Z, Koller, V, Hardie, A, Rayson, P and Payne, S (2015) A computer-assisted study of the use of Violence metaphors for cancer and end of life by patients, family carers and health professionals. International Journal of Corpus Linguistics 20(2): 205-231.

https://doi.org/10.1075/ijcl.20.2.03dem

Murrieta-Flores, P, Baron, A, Gregory, I, Hardie, A and Rayson, P (2015) Automatically analysing large texts in a GIS environment: The Registrar General’s reports and cholera in the nineteenth century. Transactions in GIS 19(2): 296-320. https://doi.org/10.1111/tgis.12106

Hardie, A (2014) Modest XML for Corpora: Not a standard, but a suggestion. ICAME Journal 38: 73-103. https://doi.org/10.2478/icame-2014-0004

Hardie, A (2012) CQPweb – combining power, flexibility and usability in a corpus analysis tool. International Journal of Corpus Linguistics 17 (3): 380-409. https://doi.org/10.1075/ijcl.17.3.04har [alternative link]

Hardie, A, Lohani, R and Yadava, YP (2011) Extending corpus annotation of Nepali: advances in tokenisation and lemmatisation. Himalayan Linguistics 10 (1): 151-165. https://doi.org/10.5070/H910123572

Gregory, I and Hardie, A (2011) Visual GISting: Bringing together corpus linguistics and Geographical Information Systems. Literary and Linguistic Computing 26 (3): 297-314. https://doi.org/10.1093/llc/fqr022

Hardie, A and McEnery, T (2010) On two traditions in corpus linguistics, and what they have in common. International Journal of Corpus Linguistics 15 (3): 384-394. https://doi.org/10.1075/ijcl.15.3.09har

Hardie, A and Mudraya, O (2009) Collocational patterning in cross-linguistic perspective: adpositions in English, Nepali, and Russian. Arena Romanistica 4: 138-149. [accessible from this website]

Dunning, A, Gregory, I and Hardie, A (2009) Freeing up digital content: new research means new licenses. Serials 22 (2): 166-173.

https://doi.org/10.1629/22166

Prentice, S and Hardie, A (2009) Empowerment and disempowerment in the Glencairn uprising: a corpus-based critical analysis of Early Modern English news discourse. Journal of Historical Pragmatics 10(1): 23-55.

https://doi.org/10.1075/jhp.10.1.03pre

Yadava, Y.P., Hardie, A., Lohani R.R., Regmi B.N., Gurung, S., Gurung, A., McEnery, T., Allwood, J., and Hall, P. (2008). Construction and annotation of a corpus of contemporary Nepali. Corpora 3(2): 213-225.

https://doi.org/10.3366/E1749503208000166

Koller, V, Hardie, A, Rayson, P and Semino, E (2008) Using a semantic annotation tool for the analysis of metaphor in discourse. Metaphorik.de 15. http://www.metaphorik.de/15/

Hardie, A (2008) A collocation-based approach to Nepali postpositions. Corpus Linguistics and Linguistic Theory 4(1): 19-62.

https://doi.org/10.1515/CLLT.2008.002

Hardie, A (2007) Part-of-speech ratios in English corpora. International Journal of Corpus Linguistics 12(1): 55-81. https://doi.org/10.1075/ijcl.12.1.05har

Hardie, A (2007) From legacy encodings to Unicode: the graphical and logical principles in the scripts of South Asia. Language Resources and Evaluation 41(1): 1-25. https://doi.org/0.1007/s10579-006-9003-7

Baker, P, Hardie, A, McEnery, T, Xiao, R, Bontcheva, K, Cunningham, H, Gaizauskas, R, Hamza, O, Maynard, D, Tablan, V, Ursu, C, Jayaram, BD and Leisher, M (2004) Corpus linguistics and South Asian languages: corpus creation and tool development. Literary and Linguistic Computing 19(4): 509-524. https://doi.org/10.1093/llc/19.4.509

Hardie, A and McEnery, T (2003) The were-subjunctive in British rural dialects: marrying corpus and questionnaire data. Computers and the Humanities 37(2): 205-228. https://doi.org/10.1023/A:1022657227889

Chapters in edited volumes

Supanfai, P. and Hardie, A. (2023) Corpus linguistics and the languages of Asia. In: Shei, C. and Li, S. (ed.) The Routledge Handbook of Asian Linguistics, pp. 531-547. Routledge. https://doi.org/10.4324/9781003090205-37

McEnery, T. and Hardie, A. (2023) Neo-Firthian corpus linguistics to 2000. In: Waugh, L.R., Monville-Burston, M. and Joseph, J.E. (eds) The Cambridge History of Linguistics, pp. 515-517. Cambridge University Press. https://doi.org/10.1017/9780511842788.027

McEnery, T. and Hardie, A. (2022) Corpus methods. In: Culpeper, J., Malory, B., Nance, C., Van Olmen, D., Atanasova, D., Kirkham, S., & Casaponsa, A. (eds) Introducing Linguistics, pp. 383-399. Routledge.
https://doi.org/10.4324/9781003045571-25

Semino, E, Hardie, A and Zakrzewska, J (2020) Applying corpus linguistics to a diagnostic tool for pain. In: Demjén, Z (ed.) Applying linguistics in illness and healthcare contexts, pp. 99-128. Bloomsbury. https://doi.org/10.5040/9781350057685.0011

Hughes, J. and Hardie, A. (2020). Corpus linguistics and event-related potentials. In Egbert, J. and Baker, P. (eds) Using corpus methods to triangulate linguistic analysis, pp. 185-218. Routledge. https://doi.org/10.4324/9781315112466-8

McEnery, T, Hardie, A and Younis, N (2019) Introducing Arabic corpus linguistics. In: McEnery, T, Hardie, A and Younis, N (eds) Arabic Corpus Linguistics, pp. 1-16. Edinburgh University Press.

Ibrahim, WMA and Hardie, A (2019) Accessible corpus annotation for Arabic. In: McEnery, T, Hardie, A and Younis, N (eds) Arabic Corpus Linguistics, pp. 56-75. Edinburgh University Press.

Mohamed, G and Hardie, A (2019) Approaching text typology through cluster analysis in Arabic. In: McEnery, T, Hardie, A and Younis, N (eds) Arabic Corpus Linguistics, pp. 201-228. Edinburgh University Press.

Gregory, I, Donaldson, C, Hardie, A and Rayson, P (in press) Modelling space and time in historical texts. In: Flanders, J and Jannidis, F (eds) The Shape of Data in Digital Humanities: Modeling Texts and Text-based Resources, pp. 133-149. Routledge.

Hardie, A. (2018) Using the Spoken BNC2014 in CQPweb. In: Brezina, V., Love, R. and Aijmer, K. (eds) Corpus Approaches to Contemporary British Speech: Sociolinguistic Studies of the Spoken BNC2014, pp. 27-30. Routledge. https://doi.org/10.4324/9781315268323-4

Hardie, A and Brandt, S (2018) First language acquisition. In: Culpeper, J., Kerswill, P., Wodak, R., McEnery, T. and Katamba, F. (eds.) English Language: Description, Variation and Context, 2^nd edition, pp. 541-559. Palgrave Macmillan [reprinted by Bloomsbury]. (Revision of Hardie 2009, “First Language Acquisition”.)

Baker, H, McEnery, T and Hardie, A (2017). A corpus-based investigation into English representations of Turks and Ottomans in the early modern period. In: Pace-Sigge, M and Patterson, KJ (eds) Lexical Priming: Applications and advances, pp. 41-66. John Benjamins. https://doi.org/10.1075/scl.79.02bak

Hardie, A (2016) Infrastructure for analysis of the CEPhiT corpus: implementation and applications of corpus annotation and indexing. In: Moskowich, I., Camiña Rioboo, G., Lareo, I. and Crespo, B. (eds) ‘The Conditioned and the Unconditioned’: Late Modern English Texts on Philosophy, pp. 61-76. John Benjamins. https://doi.org/10.1075/z.198.04har

Hardie, A (2016) Corpus linguistics. In Allan, K (ed.) The Routledge Handbook of Linguistics, pp. 502-515. Routledge.

Gregory, I, Cooper, D, Hardie, A and Rayson, P (2015). Spatializing and analyzing digital texts: Corpora, GIS and places. In: Bodenhamer, D, Corrigan, J and Harris, TM (eds) Deep Maps and Spatial Narratives. Indiana University Press.

Gregory, I, Baron, A, Cooper, D, Hardie, A, Murrieta-Flores, P and Rayson, P (2014) Crossing boundaries: Using GIS in literary studies, history and beyond. In: Hueber, J and Mendes da Silva, A (eds) Keys for architectural history research in the digital era. Institut national d’histoire de l’art Actes de colloques. http://inha.revues.org/4931 .

Hardie, A (2014) XML encoding for spoken learner (and other) corpora: a modest approach. In: Ishikawa, S (ed.) Learner corpus studies in Asia and the world. Vol. 2. Papers from LCSAW2014, pp. 49-62. Kobe, Japan: School of Languages and Communication, Kobe University.

McEnery, T. and Hardie, A. (2013) The history of corpus linguistics. In: Allan, K (ed.) The Oxford Handbook of the History of Linguistics, pp. 727-746. Oxford University Press. https://doi.org/10.1093/oxfordhb/9780199585847.013.0034

Hardie, A, McEnery, T, and Piao, SS (2010) A corpus-based approach to text reuse in the newsbooks of the Commonwealth. In: Dooley, B (ed.) The Dissemination of News and the Emergence of Contemporaneity in Early Modern Europe, pp. 251-286. Ashgate.

Hardie, A, Lohani, RR, Regmi, BR, and Yadava, YP (2009) A morphosyntactic categorisation scheme for the automated analysis of Nepali. In: Singh, R. (ed.) Annual Review of South Asian Languages and Linguistics 2009, pp. 171-196. Mouton de Gruyter.

Hardie, A and McEnery, T (2009) Corpus linguistics and historical contexts: text reuse and the expression of bias in early modern English journalism. In: Bowen, R, Mobärg, M and Ohlander, S (eds) (2009) Corpora and discourse – and stuff: papers in honour of Karin Aijmer, pp. 59-92. Gothenburg Studies in English 96. Göteborg: Acta Universitatis Gothoburgensis.

Hardie, A (2009) First language acquisition. In: Culpeper, J., Katamba, F. Kerswill, P., Wodak, R. and McEnery, T. (eds.) English Language: Description, Variation and Context, pp. 609-624. Houndmills: Palgrave.

Hardie, A (2009) Corpus linguistics and the languages of South Asia: some current research directions. In: Baker, P (ed.) Contemporary Corpus Linguistics, pp. 262-288. Continuum. [see also]

Hardie, A, Baker, P, McEnery, T and Jayaram, BD (2006) Corpus-building for South Asian languages. In: Saxene, A and Borin, L (eds.) Lesser-known languages in South Asia: Status and Policies, Case Studies and Applications of Information Technology, pp. 211-242. Mouton de Gruyter.

Hardie, A and McEnery, T (2006) Statistics. In: Brown, K (ed.) Encyclopaedia of Language and Linguistics, 2nd edition, vol. 12: 138-146. Oxford: Elsevier.

Hardie, A (2005) Automated part-of-speech analysis of Urdu: conceptual and technical issues. In: Yadava, Y, Bhattarai, G, Lohani, RR, Prasain, B and Parajuli, K (eds.) Contemporary issues in Nepalese linguistics, pp. 48-72. Kathmandu: Linguistic Society of Nepal.

Hardie, A, Levin, E and Pęzik, P (2005) Analiza morfologiczno-składniowa korpusów (“Part-of-speech tagging”). In: Lewandowska-Tomaszczyk, B (ed.) Podstawy językoznawsta korpusowego (“Foundations of Corpus Linguistics”). Łódź, Poland: Wydawnictwo Uniwersytetu Łódzkiego.

McEnery, T, Baker, JP and Hardie, A (2000a) Swearing and abuse in modern British English. In: Lewandowska-Tomaszczyk, B and Melia, PJ (eds.) PALC ’99: Practical Applications in Language Corpora, pp. 37-48. Peter Lang.

McEnery, T, Baker, JP and Hardie, A (2000b) Assessing claims about language use with corpus data – swearing and abuse. In: Kirk, J (ed.) Corpora Galore. Amsterdam: Rodopi. Reprinted in: Sampson, G and McCarthy, D (eds.) (2004) Corpus linguistics: readings in a widening discipline, pp. 45-55. London and New York: Continuum International.

Papers in peer-reviewed conference proceedings

Evert, S and Hardie, A (2015) Ziggurat: A new data model and indexing format for large annotated text corpora. In: Bañski, Piotr; Biber, Hanno; Breiteneder, Evelyn; Kupietz, Marc; Lüngen, Harald; Witt, Andreas (eds.) Proceedings of the 3rd Workshop on Challenges in the Management of Large Corpora (CMLC-3). Mannheim: Institut für Deutsche Sprache, pp. 21-27.

Rupp, CJ, Rayson, P, Gregory, I, Hardie, A, Joulain-Jay, A and Hartmann, D (2014) Dealing with heterogeneous big data when geoparsing historical corpora. In: Proceedings of the 2014 IEEE International Conference on Big Data, pp 80-83.

Rupp, CJ, Rayson, P, Baron, A, Donaldson, C, Gregory, I, Hardie, A and Murrieta-Flores, P (2013) Customising geoparsing and georeferencing for historical texts. In: Proceedings of the 2013 IEEE International Conference on Big Data, pp. 59-62. [alternative link]

Gregory, I, Baron, A, Murrieta-Flores, P, Hardie, A, Rayson, P and Rupp, CJ (2013) Geographical Text Analysis: GIS approaches to analysing large volumes of texts. In: Proceedings of GISRUK 2013.

Michaud, A, Guillaume, S, Hardie, A and Todam M (2012) Combining documentation and research: Ongoing work on an endangered language. In Xiong, D et al. (eds.), Proceedings of IALP 2012 (2012 International Conference on Asian Language Processing), pp. 169-172. Hanoi, Vietnam: MICA Institute, Hanoi University of Science and Technology. [alternative link]

Evert, S and Hardie, A (2011) Twenty-first century Corpus Workbench: Updating a query architecture for the new millennium. In: Proceedings of the Corpus Linguistics 2011 conference. University of Birmingham, UK.

Hardie, A (2007) Collocational properties of adpositions in Nepali and English. In: Proceedings of the Corpus Linguistics 2007 conference.

Hardie, A, Koller, V, Rayson, P and Semino, E (2007) Exploiting a semantic annotation tool for metaphor analysis. In: Proceedings of the Corpus Linguistics 2007 conference.

Semino, E, Koller, V, Hardie, A and Rayson, P (2005) A computer-assisted approach to the analysis of metaphor variation across genres. In: Barnden, J, Lee, M, Littlemore, J, Moon, R, Philip, G and Wallington, A (eds.) Corpus-based approaches to figurative language: a Corpus Linguistics 2005 colloquium, pp. 145-154. Birmingham: University of Birmingham Cognitive Science Research Papers.

Xiao, Z, McEnery, T, Baker, P and Hardie, A (2004) Developing Asian language corpora: standards and practice. In: Proceedings of the 4th Workshop on Asian Language Resources, Sanya, China.

Hardie, A (2003) Developing a tagset for automated part-of-speech tagging in Urdu. In: Archer, D, Rayson, P, Wilson, A, and McEnery, T (eds.) (2003) Proceedings of the Corpus Linguistics 2003 conference. UCREL Technical Papers Volume 16. Department of Linguistics, Lancaster University.

Archer, D, McEnery, T, Rayson, P and Hardie, A (2003) Developing an automated semantic analysis system for Early Modern English. In: Archer, D, Rayson, P, Wilson, A, and McEnery, T (eds.) (2003) Proceedings of the Corpus Linguistics 2003 conference. UCREL Technical Papers Volume 16. Department of Linguistics, Lancaster University.

Baker, P, Hardie, A, McEnery, T and Jayaram, BD (2003) Constructing corpora of South Asian languages. In: Archer, D, Rayson, P, Wilson, A, and McEnery, T (eds.) (2003) Proceedings of the Corpus Linguistics 2003 conference. UCREL Technical Papers Volume 16. Department of Linguistics, Lancaster University.

Baker, P, Hardie, A, McEnery, T and Jayaram, BD (2003) Corpus data for South Asian language processing. In: Proceedings of the EACL Workshop on South Asian Languages, Budapest.

Baker, P, Hardie, A, McEnery, T, Cunningham, H and Gaizauskas, R (2002) EMILLE, a 67-million word corpus of Indic languages: data collection, markup and harmonisation. In: Proceedings of LREC 2002.

Book reviews

Hardie, A (2013) Review of: Vander Viana, Sonia Zyngier and Geoff Barnbrook (eds.). 2011. Perspectives on Corpus Linguistics. Amsterdam and Philadelphia: John Benjamins. ICAME Journal 37: 266-271.

Hardie, A (2004) Review of: Lars Borin (ed). 2002. Parallel corpora, parallel worlds. Selected papers from a symposium on parallel and comparable corpora at Uppsala University, Sweden, 22–23 April, 1999. Amsterdam: Rodopi. Languages in Contrast 5(2): 291-296.

Edited conference proceedings

Formato, F and Hardie A (eds.) (2015) Corpus Linguistics 2015: Abstract Book. Lancaster: UCREL.

Hardie, A and Love, R (eds.) (2013) Corpus Linguistics 2013: Abstract Book. Lancaster: UCREL.

Rayson, P, Wilson, A, McEnery, T, Hardie, A and Khoja, S (eds.) (2001) Proceedings of the Corpus Linguistics 2001 conference. UCREL Technical Papers Volume 13 Special Issue. Department of Linguistics, Lancaster University.

Baker, P, Hardie, A, McEnery, T and Siewierska, A (eds.) (2000) Proceedings of the Third Discourse Anaphora and Reference Resolution Colloquium (2000). UCREL Technical Papers Volume 12 Special Issue. Department of Linguistics, Lancaster University.

Unpublished PhD thesis

Hardie, A (2004) The computational analysis of morphosyntactic categories in Urdu. Unpublished PhD thesis, Department of Linguistics and English Language, Lancaster University.

Talks and conference presentations

November 2019. The methodology of computer-assisted historical discourse analysis: On the concordance and its centrality. Invited presentation at a workshop on “Text and Language Analysis from a Diachronic Perspective: Corpus and Discourse Insights”, University of Modena and Reggio Emilia, Italy.

October 2019: The statistics of collocation: basic principles and potential problems. Invited talk at Xi’an Jiaotong University, Xi’an, China.

October 2019. Fundamentals of corpus statistics. Invited talk at Xi’an International Studies University, Xi’an, China.

October 2019. Designing and documenting a corpus. Invited talk at Xi’an International Studies University, Xi’an, China.

July 2019. Neuroimaging of collocation: Implications of electroencephalography findings for a network model of collocation (with Jennifer Hughes (lead)). Presentation at the CL2019 conference, University of Cardiff.

July 2019. Managing complex and arbitrary corpus subsections at scale and at speed: from formalism to implementation within CQPweb. Presentation at the 7th Workshop on Challenges in the Management of Large Corpora, University of Cardiff.

June 2019. Lexicogrammar and the brain, in theory and in practice (with Jennifer Hughes (lead)). Plenary presentation at the symposium on Corpus Approaches to Lexicogrammar (LxGr), Edge Hill University.

May 2019. Analysing Arabic grammar through corpus data: the case of copula/auxiliary kāna (with Wesam Ibrahim). Presentation to the UCREL Corpus Research Seminar, Lancaster University.

October 2018. The basics for corpus linguistics: where do we start with a “new” language? Plenary presentation at the 4th International Conference of the Linguistic Association of Pakistan (ICLAP 2018), Fatima Jinnah Women University, Rawalpindi, Pakistan.

October 2018. A corpus-based typological analysis of adverbials in Urdu (with Humaira Jehangir (lead)). Presentation at the 4th International Conference of the Linguistic Association of Pakistan (ICLAP 2018), Fatima Jinnah Women University, Rawalpindi, Pakistan.

October 2018. Corpus analysis with CQPweb: a practical introduction. Presentation at the 4th International Conference of the Linguistic Association of Pakistan (ICLAP 2018), Fatima Jinnah Women University, Rawalpindi, Pakistan.

September 2018. The Written BNC2014: Designing the future, respecting the past. (with Abi Hawtin (lead)). Presentation at the Thai-UK Seminar on Developing and Exploiting National Corpora, Chulalongkorn University, Bangkok, Thailand.

September 2018. Practical aspects of corpus creation: Markup, annotation and metadata in National Corpora. Presentation at the Presentation at the Thai-UK Seminar on Developing and Exploiting National Corpora, Chulalongkorn University, Bangkok, Thailand.

August 2018. A new morphosyntactic annotation schema for Georgian (with Sophiko Daraselia (lead)). Presentation at the VII International Summer School in Digital Humanities, Batumi Shota Rustaveli State University, Georgia.

July 2018. Using corpus methods to investigate guided reading: what teachers say they do, what they do, and what works (with Liam Blything (lead) and Kate Cain). Poster presentation at the conference of the Society for Text and Discourse, Brighton, UK.

July 2018. Teacher Directives and Pupil Responses in SEN Classrooms: insights from corpus methods (with Gillian Smith (lead) and Kate Cain). Poster presentation at the conference of the Society for Text and Discourse, Brighton, UK.

July 2018. The ethics of corpus-building in the age of the Digital Panopticon. Plenary presentation at the “Lancaster Postgraduate Conference in Linguistics and Language Teaching” (LAELPG), Lancaster University.

July 2018. Part-of-speech tagging in Shakespeare: Trials, tribulations and preliminary results (with Jane Demmen (lead) and Jonathan Culpeper). Presentation at the conference on “Computational Methods for Literary-Historical Textual Scholarship”, De Montfort University.

July 2017. Exploratory analysis of word frequencies across corpus texts: towards a critical contrast of approaches. Plenary presentation at the CL2017 conference, University of Birmingham.

July 2017. Guided reading: Using corpus methods to investigate how teacher strategies differ across children’s reading ability, SES, and teacher experience (with Liam Blything and Kate Cain). Presentation at the CL2017 conference, University of Birmingham.

July 2017. The ESRC Centre for Corpus Approaches to Social Science: An introduction and overview. Presentation at the CL2017 pre-conference workshop “CLARIN-UK: Promoting Cross-disciplinary Corpus Linguistics”, University of Birmingham.

July 2017. A corpus-based assessment of a diagnostic pain questionnaire. (with Elena Semino and Joanna Zakrzewska). Presentation at the CL2017 pre-conference workshop on “Corpus approaches to health communication”, University of Birmingham.

July 2017. Morphosyntactic annotation schemata: From background considerations to conflicting design imperatives. Plenary presentation at CAMRL2017 (Computational Approaches to Morphologically Rich Languages 2017), University of Leeds.

June 2017. The Spoken BNC2014: designing and building a spoken corpus of everyday conversations (with Robbie Love and Claire Dembry). Presentation at the “Spoken BNC2014 Symposium”, Lancaster University.

May 2017. Plotting and comparing corpus lexical growth curves as an assessment of OCR quality in historical news data. Presentation at the ICAME 38 conference, Charles University Prague, Czech Republic.

April 2017. Corpus methods in the humanities and social sciences: Three case studies. Invited talk at the Master Class on “Modalities of the Text”, University College Cork, Ireland.

March 2017. Introducing corpus linguistic analysis with CQPweb. Invited talk at the Department of English Philology, Complutense University of Madrid, Spain.

February 2017. Using CQPweb to analyse EEBO-TCP. Invited talk at the NEH Workshop on “The Genealogy of Texts and Ideas”, Rice University, Houston, Texas.

September 2016. Nineteenth Century Newspapers in CQPweb. Invited talk at the CLARIN-PLUS workshop on “Working with Digital Collections of Newspapers”, KU Leuven, Belgium.

June 2016. Some thoughts on transparent effect size measures for collocation. Invited talk at the Symposium on Collo-Phenomena, University of Erlangen-Nuremberg, Germany.

September 2015. Part-of-speech tagging in different kinds of language some theoretical bases for morphosyntactic annotation schemata. Plenary presentation at the Language and Modern Technologies 2015 conference, Tbilisi, Georgia.

May 2015. Multidimensional analysis for the masses. Presentation at the ICAME 36 conference, University of Trier, Germany.

March 2015. “Big data” in language studies: from cargo-cult science to phantom revolution. Plenary presentation at the CILC conference, University of Valladolid, Spain.

October 2014. Fundamentals of corpus statistics. Invited talk at the Department of English, University of Uppsala, Sweden.

October 2014. The statistics of collocation: from current practice to a new approach. Invited talk at the Department of English, University of Uppsala, Sweden.

October 2014. The art and science of concordance analysis. Invited talk at the Department of English, University of Uppsala, Sweden.

June 2014. Extending a corpus analysis tool to support the analysis of field data. Talk at the Department of Linguistics, University of Ghana.

June 2014. Yesterday, Today, Towards Tomorrow (with Tony McEnery). Plenary presentation at the IVACS 2014 conference, Newcastle University.

May 2014. Rethinking basic statistical techniques in corpus analysis. Plenary presentation at the International Symposium on Learner Corpus Studies in Asia and the World (LCSAW) 2014, Kobe University, Japan.

May 2014. XML encoding for spoken learner (and other) corpora: a modest approach. Plenary presentation at the International Symposium on Learner Corpus Studies in Asia and the World (LCSAW) 2014, Kobe University, Japan.

May 2014. Statistical identification of keywords, lockwords and collocations as a two-step procedure. Presentation at the ICAME 35 conference, University of Nottingham.

March 2014. Analysing EEBO-TCP as an annotated corpus. Invited talk at the Sheffield Centre for Early Modern Studies, University of Sheffield.

March 2014: The applicocausative voice in dialectal and standard Javanese: a corpus-based analysis (with Noor Malihah). Presentation at the Second Asia Pacific Corpus Linguistics Conference (APCLC 2014), Hong Kong Polytechnic University.

February 2014: The affordances of corpus analysis software in approaching EEBO-TCP. Invited presentation at the Northern Renaissance Seminar ‘To set the word against the word’: new directions in early modern textual analysis, Lancaster University.

January 2014. Using version control software for corpus construction. ESRC Centre for Corpus Approaches to Social Science Technical Presentation, Lancaster University.

September 2013: Transforming EEBO-TCP into a Corpus (with Paul Rayson, Alistair Baron). Presentation at the EEBO-TCP 2013 conference, University of Oxford.

June 2013: Annotation and analysis of Early Modern English corpus data. Invited presentation at the Contested Words: The Digital Analysis of Early Modern Texts workshop, University of Warwick.

June 2013: The statistics of collocation: basic principles and potential problems. Invited talk at the University of Sheffield.

May 2013: Applying cluster analysis to the problem of text-type classification (with Ghada Mohamed). Invited talk at the Institute of the Czech National Corpus, Charles University, Prague.

May 2013: Annotation and analysis: an overview of tools and techniques. Invited talk at the Institute of the Czech National Corpus, Charles University, Prague.

May 2013: Spatial analysis of corpus data using Geographical Information Systems. Invited talk at the University of Erlangen-Nuremberg.

April 2013: Annotation and analysis of Early Modern corpus data. Invited presentation at Giornata di Studi – Corpus Linguistics and Historical Corpora, University of Florence.

February 2013: Wrangling large-scale data for specialised corpora. Invited presentation at the BAAL Corpus Linguistics SIG Symposium on Building and Mining Small Specialised Corpora, University of Edinburgh.

January 2013: Approaching text typology through cluster analysis in English and Arabic corpora (with Ghada Mohamed). Presentation at the LSB2013 conference, Brussels.

September 2012: Prerequisites to a corpus-based analysis of EEBO-TCP (with Alistair Baron). Presentation at the EEBO-TCP 2012 conference, University of Oxford.

September 2012. Which ‘Lancaster’ do you mean? Disambiguation challenges in extracting place names for Spatial Humanities (with Paul Rayson and Alistair Baron). Presentation at the Digital Humanities Congress conference 2012, University of Sheffield.

January 2012: Modest XML for Corpora. Presentation to the UCREL Corpus Research Seminar, Department of Linguistics, Lancaster University.

July 2011. Research ethics in corpus linguistics (with Tony McEnery). Presentation at the CL2011 conference, University of Birmingham.

July 2011. Twenty-first century Corpus Workbench: Updating a query architecture for the new millennium (with Stefan Evert). Presentation at the CL2011 conference, University of Birmingham.

May 2011: The conceptual convergence of functional-cognitive theory and neo-Firthian linguistics (with Tony McEnery). Presentation at the ICAME 32 conference, Oslo.

May 2011: The internal gradience of the adposition category: some evidence from comparable corpora of English, Nepali and Russian. Presentation at the ICAME 32 pre-conference workshop on Corpus-Based Contrastive Analysis, Oslo.

November 2010: Extending a corpus analysis tool to support the analysis of field data: CQPweb and minority languages of South Asia. Presentation to the UCREL Corpus Research Seminar, Department of Linguistics, Lancaster University.

November 2010: Invited panel presentation at the 5^th Chicago Colloquium on Digital Humanities and Computer Science, Northwestern University, Chicago.

September 2010: An introduction to CQPweb (and its application to the lesser-studied languages of the world). Invited talk at CNRS, Paris.

September 2010: Extending a corpus analysis tool to support the analysis of field data: Bodo and Dimasa data in the CQPweb system. Presentation at the 16^th Himalayan Languages Symposium, School of Oriental and African Studies, London.

October 2009: Collocational patterning in cross-linguistic perspective: adpositions in English, Nepali, and Russian. Presentation at the 28^th International conference on Lexis and Grammar, University of Bergen.

July 2009: Corpus evidence and the internal gradience of grammatical categories in Nepali. Presentation at the 15^th Himalayan Languages Symposium, University of Oregon.

May 2009: CQPweb – combining power, flexibility and usability in a corpus analysis tool. Presentation at the ICAME 30 conference, Lancaster University.

September 2008: Visual GISting: Merging Corpus Linguistics and Geographical Information Systems (with Ian Gregory). Presentation at the Digital Resources for the Humanities and Arts conference 2008 (DRHA08), University of Cambridge.

June 2008: Text reuse and ideology: tracing duplicates and variants in the news discourse of seventeenth-century England. Presentation at the 4^th international IVACS conference, University of Limerick.

May 2008: Computer-assisted metaphor analysis using key semantic domains (with Veronika Koller, Paul Rayson, and Elena Semino). Presentation at the Researching and Applying Metaphor conference (RaAM 7), Caceres, Spain.

December 2007: Mentions in time & space: extracting and visualizing report impact from a corpus of newsbook text. Presentation at the Places of News conference, Jacobs University Bremen.

July 2007: Collocational properties of adpositions in Nepali and English. Presentation at the CL2007 conference, University of Birmingham.

July 2007: Exploiting a semantic annotation tool for metaphor analysis (with Paul Rayson, Veronika Koller, Elena Semino). Presentation at the CL2007 conference, University of Birmingham.

June 2007: Historical text mining applied to Early Modern English Literature. Presentation (jointly with Stephen Pumfrey)at the workshop on “The Electronic Revolution in Textual Analysis”, Institute for Advanced Studies, Lancaster University.

May 2007: Quantifying syntactic structures for keyness analysis. Presentation at the ICAME-28 conference, Stratford-upon-Avon.

May 2007: Collocational patterns around prepositions in English. Presentation at Madan Puraskar Pustakalaya, Kathmandu, Nepal.

April 2007: The Lancaster Newsbooks Corpus: construction and analysis. Presentation at the University of Florence, Italy.

February 2007: Prepositions in English: some thoughts towards a collocation-based approach to grammatical categorisation. Presentation at the School of English, University of Liverpool.

December 2006: Historical text mining: corpus-based approaches to the newsbooks of the Commonwealth. Presentation at the workshop on “Time and Space on the Way to Modernity: The Emergence of Contemporaneity in European Culture”, International University Bremen.

December 2006: Corpora and the languages of South Asia. Presentation to Working Group 1 of COST Action A31 on “Stability and adaptation of classification systems in a cross-cultural perspective”, at AKNOA, Humbolt-Universität, Berlin.

February 2006: A collocation-based approach to Nepali postpositions. Presentation to the Research Issues in Theoretical Linguistics group, Department of Linguistics, Lancaster University.

November 2005: Exploiting the Nepali National Corpus: postpositions and collocational patterns. Presentation at the Conference of the Linguistic Society of Nepal, Kathmandu.

November 2005: Automated part-of-speech analysis of Urdu: conceptual and technical issues. Presentation at the Conference of the Linguistic Society of Nepal, Kathmandu.

November 2005: Creating and analysing a corpus of Nepali. Presentation to the Corpus Research Group, Department of Linguistics, Lancaster University.

September 2005: Foundational issues for corpus linguistics and the languages of South Asia. Presentation to the Department of Linguistics, University of Gothenburg.

July 2005: How common is a noun? Part-of-speech ratios in English. Presentation at the CL2005 conference, University of Birmingham.

June 2005: Approaching part-of-speech tagging: manual and automatic analysis. Presentation at Madan Puraskar Pustakalaya, Kathmandu, Nepal.

February 2005: Written corpora: design and data collection. Unicode, XML and XCES: corpus encoding and mark-up. Corpus annotation. Presentations at Madan Puraskar Pustakalaya, Kathmandu, Nepal.

March 2004: Data and software resources for natural language processing in the South Asian languages. Presentation at EuroIndia 2004 conference, New Delhi.

March 2004: Tagging a new language: a case study in Urdu. Presentation at the University of Łódź, Poland.

March 2003: Developing a tagset for automated part-of-speech tagging in Urdu. Presentation at the CL2003 conference, Lancaster University.

October 2002: A part-of-speech tagset for Urdu. Presentation at the BAAL/CUP Seminar on Researching the Indic Languages Diaspora in Britain, University of York.

April 2002: A part-of-speech tagset for Urdu. Presentation to the Corpus Research Group, Department of Linguistics, Lancaster University.