Corpus Resources

Online Corpora 

Name Website Introduction Availability
BYU Corpora The BYU Corpora contains multiple corpora, which are probably the most widely-used corpora currently available. There are also many corpus-based resources. free
The British National Corpus The British National Corpus contains 100-million-word texts of British English. It does not only include written texts but also transcriptions of spoken data. free but requires registration
The Corpus of Contemporary American English (COCA) The Corpus of Contemporary American English (COCA) is the largest freely-available corpus of English, and the only large and balanced corpus of American English. free but requires registration
Compleat Lexical Tutor Compendium of online tools for both language analysis and learning free
Michigan Corpus of Academic Spoken English MICASE – a searchable collection or “corpus” of the transcripts of real-life spoken language on the University of Michigan campus free
MICASE -Michigan corpus of Upper-level student papers MICASE – written academic papers free
IntelliText Corpus Queries, the Centre for Translation Studies (CTS) at the University of Leeds Intellitext is a project funded by AHRC. It produced a versatile and intuitive interface offering a simple step-by-step approach to performing a corpus search. First-time and inexperienced corpus users can use the IntelliText Search Builder and Part-of-Speech Editor to build multi-word phrases and add grammatical information to their corpus queries – without having to enter complex string codes. (Retrieved from free

Online Corpora developed by

the Department of Linguistics and Modern Language Studies (LML), EdUHK

Name Website Introduction Availability
Corpus Linguistics Corpus Linguistics is a key research area of the Department of Linguistics and Modern Language Studies in the EdUHK. This site showcases a wide range of corpus linguistics projects carried out by staff in our department. free
English for Academic Purposes (EAP) Corpora Academic papers free
A Corpus-based Pronunciation Learning Website Corpus-based Pronunciation Learning free
English-Chinese parallel concordancer Parallel corpus free

Concordance Tools

Name Website Introduction Availability
  • Collocation finder
  • Analyze entire texts
free but requires registration
Just the word Collocation finder (based on the BNC) free
Linggle Collocation finder free
Word Neighbours Collocation finder free
Corpus Concordance English A user-friendly online concordancer with various sub-corpus capacity. free
WebCorp Live A tool for the study of language on the web. The corpora below were built by crawling the web and extracting textual content from web pages free

Corpus tools

Name Website Introduction Availability
AntConc A freeware corpus analysis toolkit for concordancing and text analysis. free

Windows software for finding word patterns.

VersaText VersaText is a new web-based tool for learners of English to explore the language of a single text. free
Sketch Engine Sketch Engine contains 400 ready-to-use corpora in 90+ languages, each having a size of up to 20 billion words to provide a truly representative sample of language. free trial
Verbal Stratagems Offers a robust collection of phrases for specific purposes, such as agreeing, checking for understanding, expressing gratitude, or making a suggestion. free
Cambridge Learner Corpus (CLC)
The EVP shows, in both British and American English, which words and phrases learners around the world know at each level – A1 to C2 – of the CEFR
free but requires registration

Other Useful Resources

Name Website Introduction Availability
Teaching Grammar and Readers
  • Grammar Teaching Resources For School Teachers
  • Grammar Teaching in Language Education
A Corpus-Based Mandarin Pronunciation Learning Website Mandarin corpus website free