高级检索

结合EMD和FWHT的构音障碍语音特征增强算法

Dysarthria speech feature enhancement algorithm by combining empirical mode decomposition and fast Walsh-Hadamard transform

  • 摘要: 传统声学特征易忽略语音的非线性、非平稳特性并且尚未同时提取患者声道、声带的病理特性,导致识别模型性能不佳。因此本文提出了一种结合经验模态分解和快速沃尔什-哈达玛变换的构音障碍语音特征增强算法。首先采用快速傅里叶变换处理语音后,引入经验模态分解自适应提取其本征模态函数,再进行快速沃尔什-哈达玛变换,接着提取基于本征模态函数的统计学特征以及功率谱密度、伽玛通频率倒谱系数的增强特征。最后在UA Speech和TORGO数据库上进行病情分级研究,并引入了非平衡分类算法评估。结果表明,该算法对比传统特征在病理语音分级研究上是有效的,并在考虑类间不平衡后,识别准确率至少提升了12.18个百分点。由此,该算法可以更为充分表征构音障碍语音特性,对其非平衡性、非线性特性及缺乏同时表征声带和声道中局部病理信息的问题具有一定的解决作用。

     

    Abstract: Dysarthria speech contains the pathological characteristics of the vocal tract and vocal folds. However, these characteristics have not yet been included in traditional acoustic features. Furthermore, the nonlinearity and non-stationarity of speech are also ignored. Therefore, this paper proposes a feature enhancement algorithm for dysarthria speech called WHFEMD by combining empirical mode decomposition (EMD) and fast Walsh-Hadamard transform (FWHT). In this proposed algorithm, the dysarthria speech undergoes fast Fourier transform first, followed by EMD to obtain intrinsic mode functions (IMFs). Then FWHT is applied to generate new coefficients and extract statistical features as well as enhanced features based on Power Spectral Density and Gammatone Frequency Cepstral Coefficients based on IMFs. Disease classification is conducted using data from UA Speech and TORGO databases, which is further evaluated by using an imbalanced classification algorithm. According to experimental findings, WHFEMD enhanced features are significantly superior to traditional features. After balancing the data with the imbalanced classification algorithm, the identification accuracy rate increased by at least 12.18 percentage. This demonstrates that WHFEMD can more ccomprehensively characterize dysarthria speech while addressing issues related to its non-stationary and non-linear characteristics as well as lack of simultaneous characterization of local pathological information in both vocal folds and vocal tracts.

     

/

返回文章
返回