This is a crash course to help you have a clear understanding regarding the following perspectives:

1. What is a CORPUS (plural: corpora)?

1.1 Where does the word ‘corpus’ come from?

The word ‘corpus’ comes from Latin and literally means ‘body’. The sense of "body of a person" (mid 15c. in English) and "collection of facts or things" (1727 in English). Both were present in Latin.

Responsive image

Retrieved from website

1.2 Definition of the word ‘corpus’

If we look it up the dictionary, we will find the following definition:

Responsive image
Responsive image

Retrieved from website

Responsive image

Adapted from Merriam-Webster's Advanced Learner's Dictionary

As you can see, ‘body’ is used as a metaphor, and you will come up with your own understanding of the word ‘corpus’ by knowing more about it.

1.3 What is a corpus in the world of corpus linguistics?

So what exactly is a corpus (plural corpora)?

A corpus is a large, principled collection of naturally occurring texts (written or spoken) stored electronically. (Reppen, 2010, p. 2)

In a way, it is a large collection of authentic speech and writing that is built according to specific criteria and presented in electronic form. Any text can be a corpus: newspapers, facebook posts, recipes, novels, speeches, scripts, friends chatting, letters, books, magazines, lectures, compositions, memos, etc.

1.4 Where does Corpus linguistics come from?

This link brings you a brief history of corpus linguistics.

History of Corpus Linguistics. Retrieved from

2. What can we discover through CORPORA?

2.1 Concordance Line

A concordance line is a line of text taken from a corpus. Each concordance line in a set includes the key word, i.e. the word being studied. By reading concordance lines presented in key word in context (KWIC) format, you can obtain and retain certain lexico-grammatical pattern of the target word. Furthermore, it is helpful for them in identifying and correcting lexico-grammatical errors.

Here is an example of a set of concordance lines for the target word: any.

Responsive image

2.2 Word frequency


  • What are the most frequent verbs in English?

    Responsive image
  • What are the most frequent adjectives in TV sitcom Big Bang Theory?

  • What is the frequency of certain words / phrases (e.g. may be) used by native speakers and English learners in conversation?

2.3 Register


  • What are the differences between spoken and written English?

  • Which words are used in more formal / informal situations?

  • Is the word “beverage” used in more formal or informal situations?

  • Responsive image

2.4 Collocation

Collocation is the way in which some words are often used together, or a particular combination of words used in this way.


  • What prepositions follow particular verbs?

  • What nouns occur after the word “thick”?

  • Responsive image
  • What are the use patterns of the word any and what are the functions of any?

  • How can we use the modal verbs, such as can, could, may, might, must, shall?

2.5 Lexical Bundles

Lexical Bundles are groups of words that occur repeatedly together within the same register. Lexical bundles are also referred to as “N-grams”. Below is a very brief sample of bundles from only two registers.


    Responsive image

2.6 Colligation


    What are the differences between the adjectives “little” and “small” in different grammatical contexts?

3. WHY Use a CORPUS for Teaching and Learning?

3.1 Five Reasons for Using Corpora

This link gives you five central reasons for using corpus linguistics by interviewing Prof. Anke Lüdeling, a well-known corpus linguist in Germany.

5 Reasons: Anke Luedeling on "Corpus Linguistics" Retrieved from

3.2 Benefits of using corpora in classroom

Benefits of using corpora in classroom. Retrieved from