Principal researcher: Wang Lixun
Co-researchers: Chin Chi On, Kataoka Shin, Zoe Luk Pei Sui
In the pilot project, for each of the 6 parallel corpora, around 100,000 words of language data in each language will be collected and compiled into a parallel corpus. The source of data will be subtitles from movies (Hollywood, Japanese, Mandarin, Cantonese movies) including the original texts plus the equivalent translations. This multilingual parallel corpus can be regarded as multimodal since researchers can get access to and watch the actual movies easily if they want when carrying out research based on the corpus data. After the completion of the multilingual parallel corpus, the research team intends to carry out comparative linguistic studies on topics such as the use of deixis and ellipsis in English, Mandarin, Cantonese and Japanese. |