ارسل ملاحظاتك

ارسل ملاحظاتك لنا







يجب تسجيل الدخول أولا

Speech-Based CALL System to Evaluate the Meaning and Grammar Errors in English Spoken Utterance

المؤلف الرئيسي: Ateeq, Mohammad (Author)
مؤلفين آخرين: Hanani, Abualsoud (Advisor)
التاريخ الميلادي: 2019
موقع: بيرزيت
الصفحات: 1 - 69
رقم MD: 1085420
نوع المحتوى: رسائل جامعية
اللغة: الإنجليزية
الدرجة العلمية: رسالة ماجستير
الجامعة: جامعة بيرزيت
الكلية: كلية الهندسة والتكنولوجيا
الدولة: فلسطين
قواعد المعلومات: Dissertations
مواضيع:
رابط المحتوى:
صورة الغلاف QR قانون

عدد مرات التحميل

37

حفظ في:
المستخلص: In this research, we are developing a CALL (Computer Assisted Language Learning) sys¬tem to evaluate the English spoken sentences grammatically and linguistically. We give the user a certain prompt written in his native language, then the response is recorded as English audio file. The English spoken response is converted to text using baseline English DNN-HMM ASR and another two commercial ASRs (Google and Microsoft Bing). The produced transcription is assessed in terms of language and meaning errors. Grammatical errors are detected using English grammar checker, part of speech anal¬ysis and extracting incorrect bi-grams from grammatically incorrect responses. Errors related to the meaning are detected using novel approaches which measure the simi¬larity between the given response and stored set of reference responses. The training and testing datasets of spoken CALL shared task 2017 and 2018 were used in all of our experiments presented in this thesis. We propose three main approaches to build this CALL system. The first approach is rule-based, which take a final decision about the given response (accept or reject) by passing audio transcription given by ASR (text) through a sequence of pipelined stages and rules. Each rule checks if the response has a language error or not. If a rule can not detect any errors, it passes the response to the next rule, and so on. In the second approach, the genetic algorithm was combined with firs approach to tune the parameters and thresholds used in each rule. The third approach is a machine learning model which predicts the final decision, accept or reject. Different types of features were extracted from the response and used in these approaches. The universal sentence encoder was used to encode each sentence into 512-dimensional vector to represent the semantic features of the response. Also, we propose a binary embedding approach to produce 438 binary features vector from the response. To assess the grammatical errors, a set of features were extracted using the grammar checker tool and part of speech analysis from the text response. Finally, the best two DNN models have been fused together to enhance the system performance. D-score was used as a performance metric in all of our experiments. The D-score of our three proposed systems are 6.5, 14.4 and 13.87, respectively. Compared with the results of similar systems (spoken CALL shared task 2018) published in Interspeech 2018, our second and third systems outperform them.

عناصر مشابهة