This is a crash course about corpus linguistics. The contents are divided into the following perspectives:

 

1. What Is A CORPUS (Plural: Corpora)?

1.1 Where does the word ‘corpus’ come from?

The word ‘corpus’ comes from Latin and literally means ‘body’. The sense of “body of a person” (mid 15c. in English) and “collection of facts or things” (1727 in English) were both present in Latin.

Retrieved from Dictionary.com website http://www.dictionary.com/browse/corpus

 

1.2 Definition of the word ‘corpus’

If we look it up in the dictionary, we will find the following definition:

Adapted from Merriam-Webster’s Advanced Learner’s Dictionary

As you can see, ‘body’ is used as a metaphor, and you will come up with your own understanding of the word ‘corpus’ by knowing more about it.

 

1.3 What is a corpus in the world of corpus linguistics?

So what exactly is a corpus (plural corpora)?

A corpus is a large, principled collection of naturally occurring texts (written or spoken) stored electronically. (Reppen, 2010, p. 2)

In a way, it is a large collection of authentic speech and writing that is built according to specific criteria and presented in electronic form. Any text can be a corpus: newspapers, Facebook posts, recipes, novels, speeches, scripts, friends chatting, letters, books, magazines, lectures, compositions, memos, etc.

 

1.4 Where does corpus linguistics come from?

This video gives you a brief history of corpus linguistics.

History of Corpus Linguistics. Retrieved from https://www.youtube.com/watch?v=L1kKKsWA6R4

2. What Can We Discover Through CORPORA?

2.1 Concordance Line

A concordance line is a line of text taken from a corpus. Each concordance line in a set includes the key word, i.e. the word being studied. By reading concordance lines presented in key word in context (KWIC) format, you can obtain and retain certain lexico-grammatical patterns of the target word. Furthermore, studying concordance lines helps us to identify and correct lexico-grammatical errors.

Here is an example of a set of concordance lines for the target word: any.

 

2.2 Word frequency

E.g.

  • What are the most frequent verbs in English?

  • What are the most frequent adjectives in TV sitcom Big Bang Theory?

  • What is the frequency of certain words / phrases (e.g. may be) used by native speakers and English learners in conversation?

 

2.3 Register

E.g.

  • What are the differences between spoken and written English?

  • Which words are used in more formal / informal situations?

  • Is the word “beverage” used in more formal or informal situations?

 

2.4 Collocation

Collocation is the way in which some words are often used together, or a particular combination of words used in this way.

E.g.

  • What prepositions follow particular verbs?

  • What nouns occur after the word “thick”?

  • What are the use patterns of the word any and what are the functions of any?

  • How can we use the modal verbs, such as can, could, may, might, must, shall?

 

2.5 Lexical Bundles

Lexical Bundles are groups of words that occur repeatedly together within the same register. Lexical bundles are also referred to as “N-grams”. Below is a very brief sample of bundles from only two registers.

E.g.

 

2.6 Colligation

E.g. What are the differences between the adjectives “little” and “small” in different grammatical contexts?

3. WHY Use A CORPUS For Teaching And Learning?

3.1 Five Reasons for Using Corpora

This link gives you five central reasons for using corpus linguistics by interviewing Prof. Anke Lüdeling, a well-known corpus linguist in Germany.

5 Reasons: Anke Luedeling on “Corpus Linguistics” Retrieved from https://youtu.be/21a-lOghoK0?t=2m20s

3.2 Benefits of using corpora in classroom

Benefits of using corpora in classroom. Retrieved from https://youtu.be/EZGhQ9FR8nw?t=37s