模式识别与人工智能
Thursday, Apr. 3, 2025 Home      About Journal      Editorial Board      Instructions      Ethics Statement      Contact Us                   中文
  2017, Vol. 30 Issue (4): 359-364    DOI: 10.16451/j.cnki.issn1003-6059.201704008
Researches and Applications Current Issue| Next Issue| Archive| Adv Search |
Towards End to End Speech Recognition System for Tibetan
WANG Qingnan, GUO Wu, XIE Chuandong
National Engineering Laboratory for Speech and Language Information Processing, University of Science and Technology of China, Hefei 230027

Download: PDF (555 KB)   HTML (1 KB) 
Export: BibTeX | EndNote (RIS)      
Abstract  End to end speech recognition based on connectionist temporal classification (CTC) is applied to the Tibetan automatic speech recognition(ASR), and the performance is better than that of the state-of-the-art bidirectional long short-term memory approach. In end to end speech recognition,the linguistic knowledge such as pronunciation lexicon is not essential, and therefore the performance of the ASR systems based on CTC is weaker than that of the baseline. Aiming at this problem, a strategy combining the existing linguistic knowledge and the acoustic modeling based on CTC is proposed, and the tri-phone is taken as the basic units in acoustic modeling. Thus, the sparse problem of the modeling unit is effectively solved, and the discrimination and robustness of the CTC model are improved substantially.Results on the test set of Tibetan corpus show that the word accuracy of the model based on CTC is improved substantially and the effectiveness of the combination of the linguistic information and the CTC modeling is verified.
Key wordsEnd to End      Tibetan      Automatic Speech Recognition      Connectionist Temporal Classification     
Received: 30 September 2016     
Fund:Supported by National Key Research and Development Plan(No.2016YFB1001300)
About author:: (WANG Qingnan(Corresponding author), born in 1992, master student. His research interests include speech recognition.)
(GUO Wu, born in 1973, Ph.D., associate professor. His research interests include speech recognition and speaker recognition.)
(XIE Chuandong, born in 1990, master student. His research interests include speech recognition and keyword search.)
Service
E-mail this article
Add to my bookshelf
Add to citation manager
E-mail Alert
RSS
Articles by authors
WANG Qingnan
GUO Wu
XIE Chuandong
Cite this article:   
WANG Qingnan,GUO Wu,XIE Chuandong. Towards End to End Speech Recognition System for Tibetan[J]. , 2017, 30(4): 359-364.
URL:  
http://manu46.magtech.com.cn/Jweb_prai/EN/10.16451/j.cnki.issn1003-6059.201704008      OR     http://manu46.magtech.com.cn/Jweb_prai/EN/Y2017/V30/I4/359
Copyright © 2010 Editorial Office of Pattern Recognition and Artificial Intelligence
Address: No.350 Shushanhu Road, Hefei, Anhui Province, P.R. China Tel: 0551-65591176 Fax:0551-65591176 Email: bjb@iim.ac.cn
Supported by Beijing Magtech  Email:support@magtech.com.cn