Search algorithms

 

These search algorithms are designed to extract the 58 linguistic features from CLAWS tagged corpora (C7) for use in a multi-feature/multi-dimensional analysis. A detailed discussion of the functions of these linguistic features can be found in Biber (1988: 211-245). File-based search patterns can be downloaded below. After downloading, extract these compressed text files into c:\wsmith. These algorithms are designed for use with WordSmith Tools version 3.

 

After starting WordSmith, go to 'Settings Tags' and activate 'Tags to ignore' (<*>). This allows the program to ignore all elements included in the angular brackets (metadata, comments, etc) in the corpus files. Copy and paste these search patterns into the text box 'Search word or phrase'. Adjust 'Context words & Context search horizons' (left and right) where appropriate as specified for individual algorithms.

 

Source: Xiao, Z. & McEnery, A. (2005) 'Two approaches to genre analysis: three genres in modern American English'. Journal of English Linguistics (33)1: 62-82.

 

 

Factor 1 (28 linguistic features):

(1) private verbs: c:\wsmith\privatev.txt

(2) THAT deletion: c:\wsmith\thatdel1.txt - c:\wsmith\thatdel8.txt

(3) contraction: *'*

Context 1L 2R =~*_GE/~"_"/~*_NP*/~*_NN*/~*_MC*/~*_RA/~*_UH*/~*_FO/~'_"

(4) present tense verbs: c:\wsmith\present.txt

(5) 2nd person pronouns: *_PPY/your_APPGE/yourself_PPX1/yourselves_PPX2/ yours_PPGE

(6) DO as pro-verb: *_VD*

Context 0L 4R =~*_XX/~*_PPY/~*_PP?S*/~*_V?I

(7) analytic negation: *_XX

(8) demonstrative pronouns: this_DD1/that_DD1/these_DD2/those_DD2

Context 0L 3R=~*_NN*/~*_NP*/~*_PN1

(9) general emphatics: c:\wsmith\emphatic.txt

(10) 1st person pronouns: *_PPI*/my_APPGE/our_APPGE/myself_PPX1/ourselves _PPX2/mine_PPGE/ours_PPGE

(11) pronoun IT: it_PPH1

(12) BE as main verb: *_VB*

Context 0L 3R =*_D*/*_A*/*_NNB/*_I*/*_J*/~*_V?G/~*_V?N

(13) causative subordination: because_CS

(14) discourse markers:

a) well_* context 1L 0R = ~AS_*/~FEEL*_V*/~FELT_V*;

b) now_*/anyway*_*/anyhow_*

Context 2L 0R =?_?/AND_*/BUT_*/*_UH/~*_V*/~RIGHT_*

(15) indefinite pronouns: none_PN/*_PN1

(16) general hedges: c:\wsmith\hedge.txt

(17) amplifiers: c:\wsmith\amplify.txt

(18) sentence relatives: ,_, which_DDQ

(19) WH questions: ?_? WHAT_DDQ/?_? *_RRQ

Context 0L R3 =*_VD*/*_VB*/*_VH*/*_VM*

(20) possibility modals: can_VM/ca_VM/could_VM/may*_VM/might_VM

(21) non-phrasal coordination:

a) ,_, AND_CC IT_P*/,_, AND_CC SO_*/,_, AND_CC THEN_*/,_, AND_CC YOU_PPY*

b) ,_, AND_CC YOU_PPY/,_, AND_CC THERE_EX *_VB*

c) ,_, AND_CC TH*_DD1/,_, AND_CC TH*_DD2/,_, AND_CC *_PP?S*

(22) WH clauses: c:\wsmith\pps.txt context 0L 3R= *_DDQ/~?_?/~*_I*

(23) final prepositions: *_I* context 0L 2R=?_?/~(_(

(24) other nouns: *_NN*/*_NP*/*_ND1

Context 0L 0R = ~*TION*_N*/~*MENT*_N*/~*NESS*_N*/~*ITY_N*/~*ITIES _N*

(25) word length: (WordSmith wordlist function: average word length)

(26) prepositions: *_I*

(27) type/token ratio: (WordSmith wordlist function: standardized type/token ratio)

(28) attributive adjectives: *_JJ *_NN*/*_JJ *_JJ

 

Factor 2 (6 linguistic features):

(29) past tense verbs: *_V?D*

(30) 3rd person pronouns: c:\wsmith\3persprn.txt

(31) perfect aspect verbs: c:\wsmith\perf_asp.txt

(32) public verbs: c:\wsmith\publicv.txt

(33) synthetic negation: no_AT/neither_*/nor_*

(34) present participial clauses: ,_, *_V?G *_I*/,_, *_V?G *_D*/,_, *_V?G *_P*/,_, *_V?G *_R*

Context L3 0R= ~*_VB*

 

Factor 3 (7 linguistic features):

(35) WH relative clauses: *_NN* *_PNQ*/WHICH*_DDQ*/WHOSE_DDQGE

Context 1L 0R= ~ASK*_V*/~TELL*_V*/~TOLD_V*/~*_I*/~?_?

(36) pied piping constructions: *_NN* *_PNQ*/WHICH*_DDQ*/WHOSE_DDQGE

Context 1L 0R =*_I*

(37) phrasal coordination: *_R* and _CC *_R*/*_J* and_CC *_J*/*_V* and_CC *_V*/*_N* and_CC *_N*

(38) nominalizations: *tion_N*/*_tions_N*/*ment_N*/*ments_N*/*ness_N*/ *nesses_N*/*ity_N*/*ities_N*

(39) time adverbials: *_RT*

(40) place adverbials: *_RL*

(41) other adverbs: *_R* minus all totals of hedges, amplifiers, downtoners, place adverbials and time adverbials

 

Factor 4 (6 linguistic features):

(42) infinitives: to_TO *_V?I/to_TO *_R* *_V?I/to_TO *_R* R_* *_V?I

(43) prediction modals: will_VM/wo_VM/shall_VM/sha_VM/'ll_VM/would_VM/ 'd_VM

(44) suasive verbs: c:\wsmith\suasivev.txt

(45) conditional subordination: if_CS/unless_CS

(46) necessity modals: ought_VM*/should_VM/must_VM

(47) split auxiliaries: c:\wsmith\splitaux.txt

 

Factor 5 (6 linguistic features):

(48) conjuncts: c:\wsmith\conjunct.txt

(49) agentless passives: c:\wsmith\agtlspsv.txt

Context 0L 6R=~by_II

(50) past participial clauses: ?_? *_V?N *_I*/?_? *_V?N *_R*

(51) BY-passives: c:\wsmith\by_psv.txt

Context 0L 6R=by_II

(52) past participial WHIZ deletions: c:\wsmith\whizdel.txt

Context 2L 0R= ~GET*_V*/~GOT_V*/~*_VH*

(53) other adverbial subordinators: c:\wsmith\otheradv.txt

 

Factor 6 (4 linguistic features):

(54) THAT clauses as verb complements: *_V* that_CST

(55) demonstratives: THESE_DD2/THOSE_DD2/THIS_DD1/THAT_DD1

Context 0L 3R= *_NN*/*_NP*/*_PN1

(56) THAT relative clauses: *_NN* THAT_CST

Context 0L 4R= *_AT*/*_D*/*_NP*/*_PP*/*_N*2*

(57) THAT clauses as adjective complements: *_JJ that_CST

Context 1L 0R= ~so_*

 

Factor 7 (1 linguistic feature):

(58) SEEM/APPEAR: seem*_V*/appear*_V