Search algorithms
These search
algorithms are designed to extract the 58 linguistic features from CLAWS tagged
corpora (C7) for use in a multi-feature/multi-dimensional analysis. A detailed
discussion of the functions of these linguistic features can be found in Biber
(1988: 211-245). File-based search patterns can be downloaded below. After
downloading, extract these compressed text files into c:\wsmith. These
algorithms are designed for use with WordSmith Tools version 3.
After starting WordSmith, go to 'Settings Tags' and activate 'Tags to ignore' (<*>). This allows the program to ignore all elements included in the angular brackets (metadata, comments, etc) in the corpus files. Copy and paste these search patterns into the text box 'Search word or phrase'. Adjust 'Context words & Context search horizons' (left and right) where appropriate as specified for individual algorithms.
Source: Xiao, Z. & McEnery, A. (2005) 'Two approaches to genre analysis: three genres in modern American English'. Journal of English Linguistics (33)1: 62-82.
Factor 1 (28 linguistic features):
(1) private verbs:
c:\wsmith\privatev.txt
(2) THAT deletion:
c:\wsmith\thatdel1.txt - c:\wsmith\thatdel8.txt
(3) contraction: *'*
Context 1L 2R
=~*_GE/~"_"/~*_NP*/~*_NN*/~*_MC*/~*_RA/~*_UH*/~*_FO/~'_"
(4) present tense
verbs: c:\wsmith\present.txt
(5) 2nd
person pronouns: *_PPY/your_APPGE/yourself_PPX1/yourselves_PPX2/ yours_PPGE
(6) DO as pro-verb:
*_VD*
Context 0L 4R
=~*_XX/~*_PPY/~*_PP?S*/~*_V?I
(7) analytic negation:
*_XX
(8) demonstrative
pronouns: this_DD1/that_DD1/these_DD2/those_DD2
Context 0L
3R=~*_NN*/~*_NP*/~*_PN1
(9) general emphatics:
c:\wsmith\emphatic.txt
(10) 1st
person pronouns: *_PPI*/my_APPGE/our_APPGE/myself_PPX1/ourselves
_PPX2/mine_PPGE/ours_PPGE
(11) pronoun IT:
it_PPH1
(12) BE as main verb:
*_VB*
Context 0L 3R
=*_D*/*_A*/*_NNB/*_I*/*_J*/~*_V?G/~*_V?N
(13) causative
subordination: because_CS
(14) discourse
markers:
a) well_* context 1L
0R = ~AS_*/~FEEL*_V*/~FELT_V*;
b)
now_*/anyway*_*/anyhow_*
Context 2L 0R
=?_?/AND_*/BUT_*/*_UH/~*_V*/~RIGHT_*
(15) indefinite
pronouns: none_PN/*_PN1
(16) general hedges:
c:\wsmith\hedge.txt
(17) amplifiers:
c:\wsmith\amplify.txt
(18) sentence
relatives: ,_, which_DDQ
(19) WH questions: ?_?
WHAT_DDQ/?_? *_RRQ
Context 0L R3
=*_VD*/*_VB*/*_VH*/*_VM*
(20) possibility
modals: can_VM/ca_VM/could_VM/may*_VM/might_VM
(21) non-phrasal
coordination:
a) ,_, AND_CC
IT_P*/,_, AND_CC SO_*/,_, AND_CC THEN_*/,_, AND_CC YOU_PPY*
b) ,_, AND_CC
YOU_PPY/,_, AND_CC THERE_EX *_VB*
c) ,_, AND_CC
TH*_DD1/,_, AND_CC TH*_DD2/,_, AND_CC *_PP?S*
(22) WH clauses:
c:\wsmith\pps.txt context 0L 3R= *_DDQ/~?_?/~*_I*
(23) final
prepositions: *_I* context 0L 2R=?_?/~(_(
(24) other nouns:
*_NN*/*_NP*/*_ND1
Context 0L 0R =
~*TION*_N*/~*MENT*_N*/~*NESS*_N*/~*ITY_N*/~*ITIES _N*
(25) word length: (WordSmith
wordlist function: average word length)
(26) prepositions:
*_I*
(27) type/token ratio:
(WordSmith wordlist function: standardized type/token ratio)
(28) attributive
adjectives: *_JJ *_NN*/*_JJ *_JJ
Factor 2 (6 linguistic features):
(29) past tense verbs:
*_V?D*
(30) 3rd
person pronouns: c:\wsmith\3persprn.txt
(31) perfect aspect
verbs: c:\wsmith\perf_asp.txt
(32) public verbs:
c:\wsmith\publicv.txt
(33) synthetic
negation: no_AT/neither_*/nor_*
(34) present
participial clauses: ,_, *_V?G *_I*/,_, *_V?G *_D*/,_, *_V?G *_P*/,_, *_V?G
*_R*
Context L3 0R= ~*_VB*
Factor 3 (7 linguistic features):
(35) WH relative
clauses: *_NN* *_PNQ*/WHICH*_DDQ*/WHOSE_DDQGE
Context 1L 0R=
~ASK*_V*/~TELL*_V*/~TOLD_V*/~*_I*/~?_?
(36) pied piping
constructions: *_NN* *_PNQ*/WHICH*_DDQ*/WHOSE_DDQGE
Context 1L 0R =*_I*
(37) phrasal
coordination: *_R* and _CC *_R*/*_J* and_CC *_J*/*_V* and_CC *_V*/*_N* and_CC
*_N*
(38) nominalizations:
*tion_N*/*_tions_N*/*ment_N*/*ments_N*/*ness_N*/ *nesses_N*/*ity_N*/*ities_N*
(39) time adverbials:
*_RT*
(40) place adverbials:
*_RL*
(41) other adverbs:
*_R* minus all totals of hedges, amplifiers, downtoners, place adverbials and
time adverbials
Factor 4 (6 linguistic features):
(42) infinitives:
to_TO *_V?I/to_TO *_R* *_V?I/to_TO *_R* R_* *_V?I
(43) prediction
modals: will_VM/wo_VM/shall_VM/sha_VM/'ll_VM/would_VM/ 'd_VM
(44) suasive verbs:
c:\wsmith\suasivev.txt
(45) conditional
subordination: if_CS/unless_CS
(46) necessity modals:
ought_VM*/should_VM/must_VM
(47) split
auxiliaries: c:\wsmith\splitaux.txt
Factor 5 (6 linguistic features):
(48) conjuncts:
c:\wsmith\conjunct.txt
(49) agentless
passives: c:\wsmith\agtlspsv.txt
Context 0L 6R=~by_II
(50) past participial
clauses: ?_? *_V?N *_I*/?_? *_V?N *_R*
(51) BY-passives:
c:\wsmith\by_psv.txt
Context 0L 6R=by_II
(52) past participial
WHIZ deletions: c:\wsmith\whizdel.txt
Context 2L 0R=
~GET*_V*/~GOT_V*/~*_VH*
(53) other adverbial
subordinators: c:\wsmith\otheradv.txt
Factor 6 (4 linguistic features):
(54) THAT clauses as
verb complements: *_V* that_CST
(55) demonstratives:
THESE_DD2/THOSE_DD2/THIS_DD1/THAT_DD1
Context 0L 3R=
*_NN*/*_NP*/*_PN1
(56) THAT relative
clauses: *_NN* THAT_CST
Context 0L 4R=
*_AT*/*_D*/*_NP*/*_PP*/*_N*2*
(57) THAT clauses as
adjective complements: *_JJ that_CST
Context 1L 0R= ~so_*
Factor 7 (1 linguistic feature):
(58) SEEM/APPEAR:
seem*_V*/appear*_V