Andrew Hardie
A
list of my publications
Title links either go to an
online version of the text, or to further publication details if the text is
not available online.
Books
McEnery, T and Hardie, A (2012) Corpus
Linguistics: Method, Theory and Practice.
Baker,
P, Hardie A and McEnery, T (2006)
A Glossary of
Corpus Linguistics.
Journal articles
Hardie, A (2012) CQPweb
– combining power, flexibility and usability in a corpus analysis tool.
International Journal of Corpus
Linguistics 17 (3): 380-409. [alternative link]
Hardie, A, Lohani, R and Yadava, YP (2011) Extending
corpus annotation of Nepali: advances in tokenisation and lemmatisation.
Himalayan Linguistics 10 (1):
151-165.
Gregory, I and Hardie, A (2011) Visual GISting: Bringing together corpus linguistics and
Geographical Information Systems. Literary
and Linguistic Computing 26 (3): 297-314.
Hardie, A and McEnery, T (2010) On
two traditions in corpus linguistics, and what they have in common. International Journal of Corpus Linguistics
15 (3): 384-394.
Hardie, A and Mudraya,
O (2009) Collocational patterning in
cross-linguistic perspective: adpositions in English, Nepali, and Russian.
Arena Romanistica
4: 138-149.
Dunning, A., Gregory,
Prentice, S and Hardie, A (2009) Empowerment
and disempowerment in the Glencairn uprising: a
corpus-based critical analysis of Early Modern English news discourse. Journal
of Historical Pragmatics 10(1):
23-55.
Yadava, Y.P., Hardie, A., Lohani R.R., Regmi
B.N., Gurung, S., Gurung, A., McEnery, T., Allwood, J., and Hall, P. (2008). Construction
and annotation of a corpus of contemporary Nepali. Corpora 3(2): 213-225.
Koller, V., Hardie, A., Rayson, P. and E.
Semino (2008) Using a semantic annotation tool
for the analysis of metaphor in discourse. Metaphorik.de 15. http://www.metaphorik.de/15/
Hardie, A (2008) A
collocation-based approach to Nepali postpositions. Corpus
Linguistics and Linguistic Theory 4(1): 19-62.
Hardie, A (2007) Part-of-speech
ratios in English corpora. International Journal of Corpus
Linguistics 12(1): 55-81.
Hardie, A (2007) From legacy
encodings to Unicode: the graphical and logical principles in the scripts of
South Asia. Language Resources
and Evaluation 41(1): 1-25.
Baker, P, Hardie, A, McEnery, T, Xiao, R, Bontcheva, K, Cunningham, H, Gaizauskas,
R, Hamza, O, Maynard, D, Tablan,
V, Ursu, C, Jayaram, BD and
Leisher, M (2004) Corpus
linguistics and South Asian languages: corpus creation and tool development.
Literary and Linguistic Computing
19(4): 509-524.
Hardie, A and McEnery, T (2003) The
were-subjunctive in British rural dialects: marrying corpus and
questionnaire data. Computers and the Humanities 37(2): 205-228.
Chapters in
edited volumes
McEnery, T and Hardie, A (in press) The history of
corpus linguistics. In: Allan, K (ed.) The Oxford Handbook
of the History of Linguistics. Oxford University Press.
Hardie, A, McEnery, T, and Piao, SS (2010) A corpus-based approach
to text reuse in the newsbooks of the Commonwealth. In: Dooley, B (ed.) The
Dissemination of News and the Emergence of Contemporaneity in Early Modern
Europe, pp. 251-286. Ashgate.
Hardie, A, Lohani, RR, Regmi, BR, and Yadava,
YP (2009) A morphosyntactic categorisation scheme for the automated analysis of
Nepali. In: Singh, R. (ed.) Annual
Review of South Asian Languages and Linguistics 2009, pp. 171-196. Mouton de Gruyter.
Hardie, A and McEnery, T (2009) Corpus linguistics and historical contexts:
text reuse and the expression of bias in early modern English journalism.
In: Bowen, R, Mobärg, M and Ohlander,
S (eds) (2009) Corpora and
discourse – and stuff: papers in honour of Karin Aijmer, pp. 59-92.
Gothenburg Studies in English 96. Göteborg: Acta Universitatis Gothoburgensis.
Hardie, A (2009) First language acquisition. In: Culpeper, J., Katamba, F. Kerswill,
P., Wodak, R. and McEnery, T. (eds.) English Language: Description,
Variation and Context, pp. 609-624. Houndmills:
Palgrave.
Hardie, A (2009) Corpus linguistics and the languages of
Hardie, A, Baker, P, McEnery, T and Jayaram, BD (2006) Corpus-building
for South Asian languages. In: Saxene, A and Borin, L (eds.) Lesser-known
languages in South Asia: Status and Policies, Case Studies and Applications of
Information Technology, pp.
211-242. Mouton
de Gruyter.
Hardie, A and McEnery, T (2006) Statistics. In: Brown, K (ed.) Encyclopaedia of Language
and Linguistics, 2nd edition, vol. 12: 138-146.
Hardie, A (2005) Automated
part-of-speech analysis of Urdu: conceptual and technical issues. In:
Yadava, Y, Bhattarai, G, Lohani, RR, Prasain, B and Parajuli, K (eds.) Contemporary issues in Nepalese linguistics, pp. 48-72. Kathmandu: Linguistic
Society of
Hardie, A, Levin, E and Pęzik,
P (2005) Analiza morfologiczno-składniowa
korpusów (“Part-of-speech tagging”). In: Lewandowska-Tomaszczyk, B (ed.) Podstawy
językoznawsta korpusowego (“Foundations of Corpus Linguistics”).
Łódź, Poland: Wydawnictwo Uniwersytetu Łódzkiego.
McEnery, T, Baker, JP and Hardie, A (2000a) Swearing and abuse in modern British
English. In: Lewandowska-Tomaszczyk, B and Melia, PJ (eds.) PALC ’99: Practical Applications
in Language Corpora, pp. 37-48.
Peter Lang.
McEnery, T, Baker, JP and Hardie, A (2000b) Assessing
claims about language use with corpus data – swearing and abuse. In: Kirk,
J (ed.) Corpora Galore.
Papers in peer-reviewed
conference proceedings
Evert, S and Hardie, A (2011) Twenty-first
century Corpus Workbench: Updating a query architecture for the new millennium.
In: Proceedings of the Corpus Linguistics
2011 conference. University of Birmingham, UK.
Hardie, A (2007) Collocational
properties of adpositions in Nepali and English. In: Proceedings of
the Corpus Linguistics 2007 conference.
Hardie, A, Koller, V, Rayson, P and Semino, E
(2007) Exploiting a semantic
annotation tool for metaphor analysis. In: Proceedings of the Corpus
Linguistics 2007 conference.
Semino, E, Koller, V, Hardie, A and Rayson, P
(2005) A
computer-assisted approach to the analysis of metaphor variation across genres.
In: Barnden, J, Lee, M, Littlemore, J, Moon, R, Philip, G and Wallington, A
(eds.) Corpus-based approaches to
figurative language: a Corpus Linguistics 2005 colloquium, pp. 145-154.
Xiao, Z, McEnery, T, Baker, P and Hardie, A
(2004) Developing
Asian language corpora: standards and practice. In: Proceedings of the 4th Workshop on Asian
Language Resources,
Hardie, A (2003) Developing a tagset for automated
part-of-speech tagging in Urdu. In: Archer, D, Rayson, P, Wilson, A,
and McEnery, T (eds.) (2003) Proceedings
of the Corpus Linguistics 2003 conference. UCREL Technical Papers Volume 16. Department of
Linguistics,
Archer, D, McEnery, T, Rayson, P and Hardie,
A (2003) Developing
an automated semantic analysis system for Early Modern English. In:
Archer, D, Rayson, P, Wilson, A, and McEnery, T (eds.) (2003) Proceedings of the Corpus Linguistics 2003
conference. UCREL Technical Papers
Volume 16. Department of Linguistics,
Baker, P, Hardie, A, McEnery, T and Jayaram,
BD (2003) Constructing
corpora of South Asian languages. In: Archer, D, Rayson, P, Wilson, A, and McEnery, T
(eds.) (2003) Proceedings of the Corpus
Linguistics 2003 conference. UCREL
Technical Papers Volume 16. Department of Linguistics,
Baker, P, Hardie, A, McEnery, T and Jayaram, BD (2003) Corpus
data for South Asian language processing. In: Proceedings of the EACL Workshop on South Asian Languages,
Baker, P, Hardie, A, McEnery, T, Cunningham,
H and Gaizauskas, R (2002) EMILLE, a 67-million word corpus of Indic languages: data collection,
markup and harmonisation. In: Proceedings
of LREC 2002.
Publications as editor
Rayson, P, Wilson, A, McEnery, T, Hardie, A
and Khoja, S (eds.) (2001) Proceedings
of the Corpus Linguistics 2001 conference. UCREL Technical Papers Volume 13
Special Issue. Department of Linguistics,
Baker, P, Hardie, A, McEnery, T and
Siewierska, A (eds.) (2000) Proceedings
of the Third Discourse Anaphora and Reference Resolution Colloquium (2000). UCREL Technical Papers Volume 12 Special Issue. Department of Linguistics,
Unpublished
PhD thesis
Hardie, A (2004) The computational analysis of
morphosyntactic categories in Urdu. Unpublished PhD thesis,
Department of Linguistics and English Language,
Talks
and conference presentations
February 2013: Wrangling large-scale data for specialised corpora. Invited
presentation at the BAAL Corpus
Linguistics SIG Symposium on Building and Mining Small Specialised Corpora,
University of Edinburgh.
January 2013: Approaching text typology through cluster analysis in English and
Arabic corpora (with Ghada Mohamed). Presentation at the LSB2013 conference, Brussels.
September 2012: Prerequisites to a corpus-based analysis of EEBO-TCP (with Alistair
Baron). Presentation at the EEBO-TCP 2012
conference, University of Oxford.
September 2012. Which ‘Lancaster’ do you mean? Disambiguation challenges in extracting
place names for Spatial Humanities (with Paul Rayson and Alistair Baron).
Presentation at the Digital Humanities
Congress conference 2012, University of Sheffield.
January 2012: Modest XML for Corpora. Presentation to the UCREL Corpus Research
Seminar, Department
of Linguistics,
July 2011. Research ethics in corpus linguistics
(with Tony McEnery). Presentation at the
CL2011 conference,
July 2011. Twenty-first century Corpus Workbench:
Updating a query architecture for the new millennium (with Stefan Evert). Presentation at the CL2011 conference,
May 2011: The conceptual convergence of
functional-cognitive theory and neo-Firthian linguistics (with Tony
McEnery). Presentation at the ICAME 32 conference, Oslo.
May 2011: The internal gradience
of the adposition category: some evidence from comparable corpora of English,
Nepali and Russian. Presentation at the ICAME 32 pre-conference workshop on
Corpus-Based Contrastive Analysis, Oslo.
November
2010: Extending a corpus analysis tool to
support the analysis of field data: CQPweb and minority languages of South Asia.
Presentation to the UCREL Corpus Research
Seminar, Department
of Linguistics,
November 2010: Invited panel
presentation at the 5th Chicago Colloquium on Digital Humanities and
Computer Science,
September
2010: An introduction to CQPweb (and its
application to the lesser-studied languages of the world). Invited talk at
CNRS, Paris.
September 2010: Extending a corpus analysis tool to support the analysis of field data:
Bodo and Dimasa data in the
CQPweb system.
Presentation at the 16th Himalayan Languages Symposium,
October 2009: Collocational patterning in cross-linguistic perspective: adpositions
in English, Nepali, and Russian. Presentation at the 28th International
conference on Lexis and Grammar,
July 2009: Corpus
evidence and the internal gradience of grammatical categories in Nepali. Presentation at the 15th
Himalayan Languages Symposium,
May 2009: CQPweb –
combining power, flexibility and usability in a corpus analysis tool. Presentation at the ICAME
30 conference,
September 2008: Visual GISting: Merging Corpus Linguistics
and Geographical Information Systems (with Ian Gregory). Presentation at
the Digital Resources for the Humanities
and Arts conference 2008 (DRHA08),
June 2008: Text reuse and ideology: tracing duplicates and variants in the news
discourse of seventeenth-century
May 2008: Computer-assisted metaphor analysis using key semantic domains
(with Veronika Koller, Paul Rayson, and Elena Semino). Presentation at the Researching and Applying Metaphor
conference (RaAM 7),
December 2007: Mentions in time &
space: extracting and visualizing report impact from a corpus of newsbook text.
Presentation at the Places of News conference,
July 2007: Collocational properties
of adpositions in Nepali and English. Presentation at the CL2007
conference,
July 2007: Exploiting a semantic
annotation tool for metaphor analysis (with Paul Rayson, Veronika Koller,
Elena Semino). Presentation at the CL2007 conference,
June 2007: Historical text mining
applied to Early Modern English Literature. Presentation (jointly with
Stephen Pumfrey)at the workshop on “The Electronic Revolution in Textual
Analysis”, Institute for Advanced Studies,
May 2007: Quantifying syntactic
structures for keyness analysis. Presentation at
the ICAME-28 conference,
May 2007: Collocational patterns around prepositions in English. Presentation
at Madan Puraskar Pustakalaya,
April 2007: The
February 2007: Prepositions in English: some thoughts towards a collocation-based
approach to grammatical categorisation. Presentation at the
December 2006: Historical text mining: corpus-based approaches to the newsbooks of the
Commonwealth. Presentation at the workshop on “Time and Space on the Way to
Modernity: The Emergence of Contemporaneity in European Culture”,
December 2006: Corpora and the languages of
February 2006: A collocation-based
approach to Nepali postpositions. Presentation to the Research Issues in
Theoretical Linguistics group, Department of Linguistics,
November 2005: Exploiting the
Nepali National Corpus: postpositions and collocational patterns.
Presentation at the Conference of the Linguistic Society of
November 2005: Automated
part-of-speech analysis of Urdu: conceptual and technical issues.
Presentation at the Conference of the Linguistic Society of
November 2005: Creating and
analysing a corpus of Nepali. Presentation to the Corpus Research Group, Department of
Linguistics,
September 2005: Foundational issues for corpus linguistics and the languages of
July 2005: How common is a noun? Part-of-speech ratios in English.
Presentation at the CL2005 conference,
June 2005: Approaching part-of-speech tagging: manual
and automatic analysis. Presentation at Madan Puraskar Pustakalaya,
February
2005: Written corpora: design and data
collection. Unicode, XML and XCES: corpus encoding and mark-up. Corpus
annotation. Presentations at Madan Puraskar Pustakalaya,
March 2004: Data
and software resources for natural language processing in the South Asian
languages. Presentation at EuroIndia 2004
conference,
March 2004: Tagging
a new language: a case study in Urdu. Presentation at the University of Łódź,
March 2003: Developing a tagset for automated part-of-speech
tagging in Urdu. Presentation
at the CL2003 conference,
October
2002: A part-of-speech tagset for Urdu. Presentation at the BAAL/CUP
Seminar on Researching the Indic Languages Diaspora in
April 2002: A
part-of-speech tagset for Urdu. Presentation to the Corpus Research Group,
Department of Linguistics,