Home                    Introduction                   Search Engine  
Welcome to our MMEMCJPC website
Multilingual and Multimodal English-Mandarin-Cantonese-Japanese Parallel Corpus
As multilingualism has been identified as one of the key research agendas of LML and given the research backgrounds of the proposed team members on four major world languages, namely Mandarin, English, Japanese, and Cantonese, we believe that a multilingual corpus project focusing on comparative linguistic studies of these languages will provide a research platform that promotes quality team research in this area in the department.read more
   03 - 04 - 2014
Beta version is built up.
  03 - 06 - 2013

The project is proposed by Wang Lixun, Chin Chi On, Kataoka Shin, Zoe Luk Pei Sui, and approved by LML commitee.


Principal researcher: Wang Lixun

Co-researchers: Chin Chi On, Kataoka Shin, Zoe Luk Pei Sui

In the pilot project, for each of the 6 parallel corpora, around 100,000 words of language data in each language will be collected and compiled into a parallel corpus. The source of data will be subtitles from movies (Hollywood, Japanese, Mandarin, Cantonese movies) including the original texts plus the equivalent translations. This multilingual parallel corpus can be regarded as multimodal since researchers can get access to and watch the actual movies easily if they want when carrying out research based on the corpus data. After the completion of the multilingual parallel corpus, the research team intends to carry out comparative linguistic studies on topics such as the use of deixis and ellipsis in English, Mandarin, Cantonese and Japanese.


© Copyright 2014. All rights reserved.