模式识别与人工智能
Saturday, May. 3, 2025 Home      About Journal      Editorial Board      Instructions      Ethics Statement      Contact Us                   中文
Pattern Recognition and Artificial Intelligence  2022, Vol. 35 Issue (5): 461-471    DOI: 10.16451/j.cnki.issn1003-6059.202205007
Researches and Applications Current Issue| Next Issue| Archive| Adv Search |
Clustering and Retraining Based Self-Supervised Speech Representation Learning Method
ZHANG Wenlin1, LIU Xuepeng1, NIU Tong1, YANG Xukui1, QU Dan1
1. School of Information System Engineering, PLA Strategic Support Force Information Engineering University, Zhengzhou 450001

Download: PDF (992 KB)   HTML (1 KB) 
Export: BibTeX | EndNote (RIS)      
Abstract  The existing self-supervised speech representation learning methods based on reconstruction are trained by restoring and rebuilding speech frames. However, the phoneme category information contained in the speech frame is underutilized. Combining self-supervised learning and noisy student training, a clustering and retraining based self-supervised speech representation learning method is proposed. Firstly, based on an initial self-supervised speech representation model (the teacher model),the pseudo-label reflecting the phoneme class information is obtained via unsupervised clustering. Secondly, the pseudo-label prediction task and the original masked frame reconstruction task are combined to retrain the speech representation model(the student model). Finally, the new student model is taken as the new teacher model to optimize pseudo-labels and representation models continually by iterating the whole clustering and retraining processes. Experimental results show that the speech representation model after clustering and retraining achieves better performance in downstream phoneme recognition and speaker recognition tasks.
Key wordsUnsupervised Learning      Self-Supervised Learning      Speech Representation      Pretrained Model      Mask Reconstruction      Noisy Student Training     
Received: 30 March 2022     
ZTFLH: TP912.34  
Fund:National Natural Science Foundation of China(No.61673395,62171470)
Corresponding Authors: ZHANG Wenlin, Ph.D., associate professor. His research in-terests include speech signal processing, speech recognition and machine learning.   
About author:: LIU Xuepeng, master student. His research interests include intelligent information processing, unsupervised learning and speech representation learning.
NIU Tong, Ph.D., associate professor. His research interests include speech recognition and deep learning.
YANG Xukui, Ph.D., lecturer. His research interests include language identification, continuous speech recognition and machine learning.
QU Dan, Ph.D., professor. Her research interests include machine learning, deep lear-ning and speech recognition.
Service
E-mail this article
Add to my bookshelf
Add to citation manager
E-mail Alert
RSS
Articles by authors
ZHANG Wenlin
LIU Xuepeng
NIU Tong
YANG Xukui
QU Dan
Cite this article:   
ZHANG Wenlin,LIU Xuepeng,NIU Tong等. Clustering and Retraining Based Self-Supervised Speech Representation Learning Method[J]. Pattern Recognition and Artificial Intelligence, 2022, 35(5): 461-471.
URL:  
http://manu46.magtech.com.cn/Jweb_prai/EN/10.16451/j.cnki.issn1003-6059.202205007      OR     http://manu46.magtech.com.cn/Jweb_prai/EN/Y2022/V35/I5/461
Copyright © 2010 Editorial Office of Pattern Recognition and Artificial Intelligence
Address: No.350 Shushanhu Road, Hefei, Anhui Province, P.R. China Tel: 0551-65591176 Fax:0551-65591176 Email: bjb@iim.ac.cn
Supported by Beijing Magtech  Email:support@magtech.com.cn