Introduction of the project

Funded by the Teaching Development Grant (TDG), the objective of this project is to develop a comprehensive, self-directed framework for interactive English and Mandarin pronunciation learning, supported by Corpus and AI technology. Additionally, the project aims to create relevant teaching materials to accompany the framework. The foundation of this endeavor will be two speech corpora derived from the Dr. Rebecca Chen’s previous TDG projects: The Spoken Corpus of Hong Kong Learners of Mandarin (T0150, 2015-2017) and The Spoken English Corpus of Chinese and Non-Chinese Learners in Hong Kong (T0200; 2018-2020). These corpora provide learners with authentic speech samples from both native and non-native speakers in Hong Kong, enabling them to recognize distinctive phonological features of English and Mandarin.

To facilitate the pronunciation classes, conversational and automatic speech recognition (ASR) AI tools such as ChatGPT, Copilot, Murf and Immersive Reader will be integrated into the corpus-based learning environment. By incorporating these AI tools, the project will pioneer the use of AI in pronunciation learning, equipping frontline teachers with practical materials.

Proposed learning framework

Self-directed AI and Corpus-aided Pronunciation Training Framework

English pronunciation

The Spoken English Corpus of Chinese and Non-Chinese Learners in Hong Kong

The spoken English corpus of Chinese and Non-Chinese learners in Hong Kong is expanded and redeveloped from the previous spoken corpus. Speech data were elicited from Hong Kong speakers, Mainland China speakers with eight different dialect backgrounds, and non-Chinese speakers.

Phonological annotations made in both segmental and suprasegmental aspects can help identify recurrent difficulties in English pronunciation learning for English learners with different language background in Hong Kong. The high-quality recordings are ideally suited for phonetic and acoustic analysis by teachers, learners, and researchers around the world.

Training materials 

https://corpus.eduhk.hk/L3PLT/index.php/tm-on-eng/

(to be available...)

Sample lesson plan

(to be available...)

Students’ outcomes

(to be available...)

Mandarin pronunciation

The Spoken Corpus of Hong Kong Learners of Mandarin https://corpus.eduhk.hk/pth_learner_corpus/

The spoken corpus of Hong Kong learners of Mandarin is the core of the learning platform. The main goal of the corpus is to raise the awareness of Mandarin learners, both in Hong Kong and overseas, of their own pronunciation problems and to enhance their active engagement in their Mandarin learning process.

The spoken corpus provides 40 sets of high-quality recordings of monosyllabic words, multisyllabic words, passages, and free speech. In addition to authentic speech data, annotations in broad transcription of both segmental and suprasegmental errors of the speech data and orthographical transcriptions are also accessible and available to all learners, teachers and researchers.

Training materials 

https://corpus.eduhk.hk/L3PLT/index.php/tm-on-man/

(to be available...)

Sample lesson plan

(to be available...)

Students’ outcomes

(to be available...)

Workshop Summary Video

(to be available...)