Welcome to our MMEMCJPC website
Multilingual and Multimodal English-Mandarin-Cantonese-Japanese Parallel Corpus
As multilingualism has been identified as one of the key research agendas of LML and given the research backgrounds of the proposed team members on four major world languages, namely Mandarin, English, Japanese, and Cantonese, we believe that a multilingual corpus project focusing on comparative linguistic studies of these languages will provide a research platform that promotes quality team research in this area in the department.read more

The project is proposed by Wang Lixun, Chin Chi On, Kataoka Shin, Zoe Luk Pei Sui, and approved by LML commitee.


Principal researcher: Wang Lixun

Co-researchers: Chin Chi On, Kataoka Shin, Zoe Luk Pei Sui

In the pilot project, for each of the 6 parallel corpora, around 100,000 words of language data in each language will be collected and compiled into a parallel corpus. The source of data will be subtitles from movies (Hollywood, Japanese, Mandarin, Cantonese movies) including the original texts plus the equivalent translations. This multilingual parallel corpus can be regarded as multimodal since researchers can get access to and watch the actual movies easily if they want when carrying out research based on the corpus data. After the completion of the multilingual parallel corpus, the research team intends to carry out comparative linguistic studies on topics such as the use of deixis and ellipsis in English, Mandarin, Cantonese and Japanese.


© All rights reserved. The Education University of Hong Kong. With additional technical support by the Education University of Hong Kong Library.