结合EMD和FWHT的构音障碍语音特征增强算法

朱婷; 段淑斐; DINGAM Camille; 梁慧芝; 张卫

doi:10.16300/j.cnki.1000-3630.23061201

结合EMD和FWHT的构音障碍语音特征增强算法

Dysarthria speech feature enhancement algorithm by combining empirical mode decomposition and fast Walsh-Hadamard transform

摘要

摘要: 传统声学特征易忽略语音的非线性、非平稳特性并且不能同时提取患者声道、声带的病理特性，导致识别模型性能不佳。因此文章提出了一种结合经验模态分解和快速沃尔什-哈达玛变换的构音障碍语音特征增强算法。首先，采用快速傅里叶变换处理语音后，引入经验模态分解自适应提取其本征模态函数；其次，进行快速沃尔什-哈达玛变换；接着，提取基于本征模态函数的统计学特征以及功率谱密度、伽马通频率倒谱系数的增强特征；最后，在UA Speech和TORGO数据库上进行病情分级研究，并引入了非平衡分类算法评估。结果表明，该算法对比传统特征在病理语音分级研究上是有效的，在考虑类间不平衡后，识别准确率至少提高了12.18个百分点。由此，该算法可以更充分表征构音障碍语音特性，对其非平衡性、非线性特性及缺乏同时表征声带和声道中局部病理信息的问题具有一定的改善作用。

Abstract: Dysarthria speech contains the pathological characteristics of the vocal tract and vocal folds. However, these characteristics have not yet been included in traditional acoustic features. Furthermore, the nonlinearity and non-stationarity of speech are also ignored. Therefore, this paper proposes a feature enhancement algorithm for dysarthria speech called WHFEMD by combining empirical mode decomposition (EMD) and fast Walsh-Hadamard transform (FWHT). In this proposed algorithm, the dysarthria speech undergoes fast Fourier transform first, followed by EMD to obtain intrinsic mode functions (IMFs). Then FWHT is applied to generate new coefficients and extract statistical features as well as enhanced features based on Power Spectral Density and Gammatone Frequency Cepstral Coefficients based on IMFs. Disease classification is conducted using data from UA Speech and TORGO databases, which is further evaluated by using an imbalanced classification algorithm. According to experimental findings, WHFEMD enhanced features are significantly superior to traditional features. After balancing the data with the imbalanced classification algorithm, the identification accuracy rate increased by at least 12.18 percentage. This demonstrates that WHFEMD can more ccomprehensively characterize dysarthria speech while addressing issues related to its non-stationary and non-linear characteristics as well as lack of simultaneous characterization of local pathological information in both vocal folds and vocal tracts.

HTML全文

参考文献(37)

施引文献

资源附件(0)