基于多类型语音信息分层融合的帕金森病检测模型

doi:10.16451/j.cnki.issn1003-6059.202409005

摘要
图/表
参考文献
相关文章 (14)

全文: PDF (877 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要用于帕金森病检测的语音数据通常包括持续元音、重复音节及情景对话等类型.已有模型大多采用单一类型的语音数据作为输入,容易受到噪声干扰,鲁棒性无法保证.有效整合不同类型语音数据,提取至关重要的病理信息,是当前帕金森病检测任务面临的挑战之一.文中提出基于多类型信息分层融合的帕金森病检测模型,旨在提取全面的病理信息,实现较优的检测性能.首先,针对不同类型的帕金森病语音数据,分别进行多种声学特征的提取.然后,设计挖掘多类型声学特征深层信息的表示学习方案,提取调音和韵律信息,精准反映声学特征中潜在的病理信息.进而针对两类信息,设计解耦的表示学习空间,分别提取各自的私有特征,同时学习它们的共有表示.最后,设计跨类型的注意力分层融合模块,利用交叉注意力机制,以不同粒度交互的方式逐步融合共有表示和私有表示,提升帕金森病检测性能.在公开的意大利语帕金森病语音数据集和自采的汉语帕金森病语音数据集上的实验表明,文中方法性能提升明显.

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	吴迪
	季薇
	郑慧芬
	李云

关键词 ：帕金森病, 多类型语音, 对比学习, 分层融合

Abstract：Speech data for Parkinson's disease detection typically includes sustained vowels, repeated syllables and contextual dialogues. Most of the existing models adopt a single type of speech data as input, making them susceptible to noise interference and a lack of robustness. The current challenge of Parkinson's disease detection is effectively integrating different types of speech data and extracting critical pathological information. In this paper, a Parkinson's disease detection method based on hierarchical fusion of multi-type speech information is proposed, aiming to extract rich and comprehensive pathological information and achieve better detection performance. Firstly, various acoustic features are extracted for different types of Parkinson's disease speech data. Then, a representation learning scheme is designed to mine deep information from multiple types of acoustic features. The underlying pathological information in acoustic features is reflected more accurately by extracting articulation and rhythm information. Furthermore, a decoupled representation learning space is designed for two mentioned types of information above to extract their respective private features, while learning their shared representation simultaneously. Finally, a cross-type attention hierarchical fusion module is designed to progressively fuse shared and private representations using cross-attention mechanisms at different granularities, aiming to enhance Parkinson's disease detection performance. Experiments on publicly available Italian Parkinson's disease speech dataset and a self-collected Chinese Parkinson's disease speech dataset demonstrate the accuracy improvement of the proposed approach.

Key words： Parkinson's Disease Multi-type Speech Contrastive Learning Hierarchical Fusion

收稿日期: 2024-04-24

ZTFLH:

TP 391

基金资助:江苏省高校基础科学(自然科学)重大项目(No.21KJA520003)资助

通讯作者: 季薇,博士,教授,主要研究方向为信号与信息处理、机器学习.E-mail:jiwei@njupt.edu.cn.

作者简介: 吴迪,硕士研究生,主要研究方向为机器学习、信号处理.E-mail:2799887357@qq.com.郑慧芬,博士,主任医师,主要研究方向为帕金森病及相关运动障碍性疾病.E-mail:y020627@126.com.李云,博士,教授,主要研究方向为机器学习、模式识别.E-mail:liyun@njupt.edu.cn.

引用本文:

吴迪, 季薇, 郑慧芬, 李云. 基于多类型语音信息分层融合的帕金森病检测模型[J]. 模式识别与人工智能, 2024, 37(9): 811-823. WU Di, JI Wei, ZHENG Huifen, LI Yun. Parkinson's Disease Detection Model Based on Hierarchical Fusion of Multi-type Speech Information. Pattern Recognition and Artificial Intelligence, 2024, 37(9): 811-823.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/10.16451/j.cnki.issn1003-6059.202409005 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2024/V37/I9/811

[1] XIAN W B, LIN L S, WU W L, et al. Fatigue and Long Duration of Infection Are Associated with Worsen Motor and Non-motor Symptoms in Parkinson's Disease Following Omicron COVID-19 Pande-mic. Brain and Behavior, 2024, 14(2). DOI: 10.1002/brb3.3396.
[2] NAKAMORI M, TOKO M, YAMADA H, et al. Association between Motor Symptoms of Parkinson's Disease and Swallowing Disorders. Neurological Sciences, 2024, 45(5): 2021-2026.
[3] BENBA A, JILBAB A, SANDABAD S, et al. Voice Signal Proce-ssing for Detecting Possible Early Signs of Parkinson's Disease in Patients with Rapid Eye Movement Sleep Behavior Disorder. International Journal of Speech Technology, 2019, 22(1): 121-129.
[4] KING N O, ANDERSON C J, DORVAL A D.Deep Brain Stimulation Exacerbates Hypokinetic Dysarthria in a Rat Model of Parkinson's Disease. Journal of Neuroscience Research, 2016, 94(2): 128-138.
[5] RUSZ J, CMEJLA R, RUZICKOVA H, et al. Quantitative Acoustic Measurements for Characterization of Speech and Voice Disorders in Early Untreated Parkinson's Disease. Journal of the Acoustical So-ciety of America, 2011, 129(1): 350-367.
[6] SUPHINNAPONG P, PHOKAEWVARANGKUL O, THUBTHONG N, et al. Objective Vowel Sound Characteristics and Their Relationship with Motor Dysfunction in Asian Parkinson's Disease Patients. Journal of the Neurological Sciences, 2021, 426(3). DOI: 10.1016/j.jns.2021.117487.
[7] RIOS-URREGO C D, RUSZ J, OROZCO-ARROYAVE J R. Automatic Speech-Based Assessment to Discriminate Parkinson's Disease from Essential Tremor with a Cross-Language Approach. NPJ Digital Medicine, 2024, 7(1). DOI: 10.1038/s41746-024-01027-6.
[8] HE D L, FEENAUGHTY L, WAN Q.Global Acoustic Speech Tem-poral Characteristics for Mandarin Speakers With Parkinson's Disease During Syllable Repetition and Passage Reading. American Journal of Speech-Language Pathology, 2023, 32(5): 2232-2244.
[9] OROZCO-ARROYAVE J R, HONIG F, ARIAS-LONDONO J D, et al. Automatic Detection of Parkinson’s Disease in Running Speech Spoken in Three Different Languages. Journal of the Acoustical So-ciety of America, 2016, 139(1): 481-500.
[10] KLUMPP P, VASQUEZ-CORREAJ C, HADERLEIN T, et al. Fea-ture Space Visualization with Spatial Similarity Maps for Pathological Speech Data[C/OL].[2024-03-09]. https://www.isca-archive.org/interspeech_2019/klumpp19_interspeech.pdf.
[11] RUSZ J, KRUPIČKA R, VÍTEČKOVÁ S, et al. Speech and Gait Abnormalities in Motor Subtypes of De-Novo Parkinson's Disease. CNS Neuroscience and Therapeutics, 2023, 29(8): 2101-2110.
[12] TSANAS A, LITTLE M A, MCSHARRY P E,et al. Novel Speech Signal Processing Algorithms for High-Accuracy Classification of Parkinson's Disease. IEEE Transactions on Biomedical Enginee-ring, 2012, 59(5): 1264-1271.
[13] GUNDUZ H.Deep Learning-Based Parkinson's Disease Classification Using Vocal Feature Sets. IEEE Access, 2019, 7: 115540-115551.
[14] PRAMANIK A, SARKER A.Parkinson's Disease Detection from Voice and Speech Data Using Machine Learning//Proc of the International Joint Conference on Advances in Computational Intelligence. Berlin, Germany: Springer, 2021: 445-456.
[15] SKARAMAGKAS V, PENTARI A, FOTIADIS D I, et al. Using the Recurrence Plots as Indicators for the Recognition of Parkinson's Disease through Phonemes Assessment//Proc of the 45th IEEE Annual International Conference Engineering in Medicine and Bio-logy Society. Washington, USA: IEEE, 2023. DOI: 10.1109/EMBC40787.2023.10340177.
[16] KIM M J, KIM Y, KIM H.Automatic Intelligibility Assessment of Dysarthric Speech Using Phonologically-Structured Sparse Linear Model. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2015, 23(4): 694-704.
[17] WANG X G, LI S J, PUN C M, et al. A Parkinson's Auxiliary Diagnosis Algorithm Based on a Hyperparameter Optimization Method of Deep Learning. IEEE/ACM Transactions on Computational Bio-logy and Bioinformatics, 2024, 21(4): 912-923.
[18] MALLELA J, ILLA A, SUHAS B N, et al. Voice Based Classification of Patients with Amyotrophic Lateral Sclerosis, Parkinson's Disease and Healthy Controls with CNN-LSTM Using Transfer Learning//Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Washington, USA: IEEE, 2020: 6784-6788.
[19] 张小恒,李勇明,王品.双阶段帕金森病语音聚类包络卷积稀疏迁移学习算法.仪器仪表学报, 2022, 43(11): 151-161.
(ZHANG X H, LI Y M, WANG P.Two-Stage PD Speech Clus-tering Envelope and Convolution Sparse Transfer Learning Algorithm. Chinese Journal of Scientific Instrument, 2022, 43(11): 151-161.)
[20] LIU Y Y, REDDY M K, PENTTILÄ N, et al. Automatic Assessment of Parkinson's Disease Using Speech Representations of Phonation and Articulation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2022, 31: 242-255.
[21] 季薇,王传瑜,李云,等.基于多源语音信息融合的帕金森病辅助检测方法.信号处理, 2023, 39(12): 2254-2264.
(JI W, WANG C Y, LI Y, et al. Auxiliary Detection Method of Parkinson's Disease Based on Multi-source Speech Information Fusion. Journal of Signal Processing, 2023, 39(12): 2254-2264.)
[22] HE K M, FAN H Q, WU Y X, et al. Momentum Contrast for Unsupervised Visual Representation Learning//Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 9726-9735.
[23] RADFORD A, KIM J W, HALLACY C, et al. Learning Transfe-rable Visual Models from Natural Language Supervision. Journal of Machine Learning Research, 2021, 139: 8748-8763.
[24] FRANCESCHINI R, FINI E, BEYAN C, et al. Multimodal Emotion Recognition with Modality-Pairwise Unsupervised Contrastive Loss[C/OL].[2024-03-09]. http://arxiv.org/pdf/2207.11482v1.
[25] HAZARIKA D, ZIMMERMANN R, PORIA S.MISA: Modality-Invariant and Specific Representations for Multimodal Sentiment Analysis//Proc of the 28th ACM International Conference on Multimedia. New York, USA: ACM, 2020: 1122-1131.
[26] KIM W, SON B, KIM I.ViLT: Vision-and-Language Transformer without Convolution or Region Supervision[C/OL].[2024-03-09].https://arxiv.org/pdf/2102.03334.
[27] LI M, YANG B, LEVY J, et al. Contrastive Unsupervised Lear-ning for Speech Emotion Recognition//Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Wa-shington, USA: IEEE, 2021: 6329-6333.
[28] KIM D, SONG B C.Contrastive Adversarial Learning for Person Independent Facial Emotion Recognition. Proceedings of the AAAI Conference on Artificial Intelligence, 2021, 35(7): 5948-5956.
[29] BOUCHACOURT D, TOMIOKA R, NOWOZIN S.Multi-level Va-riational Auto-Encoder: Learning Disentangled Representations from Grouped Observations. Proceedings of the AAAI Conference on Artificial Intelligence, 2018, 32(1): 2095-2102.
[30] HWANG H J, KIM G H, HONG S, et al. Variational Interaction Information Maximization for Cross-Domain Disentanglement//Proc of the 34th International Conference on Neural Information Processing System. Cambridge, USA: MIT Press, 2020: 22479-22491.
[31] BOUSMALIS K, TRIGEORGIS G, Silberman N, et al. Domain Separation Networks//Proc of the 30th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2016: 343-351.
[32] ODENA A, OLAH C, SHLENS J, et al. Conditional Image Synthesis with Auxiliary Classifier GANs. Journal of Machine Learning Research, 2017, 70: 2642-2651.
[33] CHEN D, CAO X D, WANG L W, et al. Bayesian Face Revisited: A Joint Formulation//Proc of the European Conference on Computer Vision. Berlin, Germany: Springer, 2012: 566-579.
[34] KIM H, MNIH A.Disentangling by Factorizing. Journal of Machine Learning Research, 2018, 80: 2649-2658.
[35] WU X, HUANG H B, PATEL V M, et al. Disentangled Variatio-nal Representation for Heterogeneous Face Recognition. Procee-dings of the AAAI Conference on Artificial Intelligence, 2019, 33(1): 9005-9012.
[36] GUO W K, HUANG H B, KONG X W, et al. Learning Disentangled Representation for Cross-Modal Retrieval with Deep Mutual Information Estimation//Proc of the 27th ACM International Confe-rence on Multimedia. New York, USA: ACM, 2019: 1712-1720.
[37] KHATTAR D, GOUD J S, GUPTA M, et al. MVAE: Multimodal Variational Autoencoder for Fake News Detection//Proc of the World Wide Web Conference. New York, USA: ACM, 2019: 2915-2921.
[38] TSAI Y H, BAI S, LIANG P P, et al. Multimodal Transformer for Unaligned Multimodal Language Sequences//Proc of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, USA: ACL, 2019: 6558-6569.
[39] YU W M, XU H, YUAN Z Q, et al. Learning Modality-Specific Representations with Self-Supervised Multi-task Learning for Multimodal Sentiment Analysis. Proceedings of the AAAI Conference on Artificial Intelligence, 2021, 35(12): 10790-10797.
[40] ZHANG Y, CHEN M Y, SHEN J D, et al. Tailor Versatile Multi-modal Learning for Multi-label Emotion Recognition. Proceedings of the AAAI Conference on Artificial Intelligence, 2022, 36(8): 9100-9108.
[41] POPESCU M C, BALAS V E, PERESCU-POPESCU L, et al. Multilayer Perceptron and Neural Networks. WSEAS Transactions on Circuits and Systems, 2009, 8(7): 579-588.
[42] LI X D, HU Y, ZHENG J H, et al. Central Moment Discrepancy Based Domain Adaptation for Intelligent Bearing Fault Diagnosis. Neurocomputing, 2021, 429: 12-24.
[43] LI A, FENG C, CHENG Y, et al. Incomplete Multiview Subspace Clustering Based on Multiple Kernel Low-Redundant Representation Learning. Information Fusion, 2024, 103. DOI: 10.1016/j.inffus.2023.102086.
[44] VASWANI A, SHAZEER N, PARMAR N, et al. Attention Is All You Need//Proc of the 31st International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2017: 6000-6010.
[45] ALIBRAHIM H, LUDWIG S A.Hyperparameter Optimization: Com-paring Genetic Algorithm Against Grid Search and Bayesian Optimization//Proc of the IEEE Congress on Evolutionary Computation. Washington, USA: IEEE, 2021: 1551-1559.
[46] DIMAURO G, DI NICOLA V, BEVILACQUA V, et al. Assess-ment of Speech Intelligibility in Parkinson's Disease Using a Speech-to-Text System. IEEE Access, 2017, 5: 22199-22208.
[47] WU L W, LONG Y Z, GAO C, et al. MFIR: Multimodal Fusion and Inconsistency Reasoning for Explainable Fake News Detection. Information Fusion, 2023, 100. DOI: 10.1016/j.inffus.2023.101944.
[48] DHAWAN M, SHARMA S, KADAM A, et al. GAME-ON: Graph Attention Network Based Multimodal Fusion for Fake News Detection. Social Network Analysis and Mining, 2024, 114. DOI: 10.1007/s13278-024-01271-4.
[49] VEETIL I K, SOWMYA V, OROZCO-ARROYAVE J R, et al. Robust Language Independent Voice Data Driven Parkinson's Disease Detection. Engineering Applications of Artificial Intelligence, 2024, 129. DOI: 10.1016/j.engappai.2023.107494.
[50] SONG L, SMOLA A, GRETTON A, et al. Supervised Feature Selection via Dependence Estimation//Proc of the 24th International Conference on Machine Learning. New York, USA: ACM, 2007: 823-830.