Abstract:
Dysarthria speech contains the pathological characteristics of the vocal tract and vocal folds. However, these characteristics have not yet been included in traditional acoustic features. Furthermore, the nonlinearity and non-stationarity of speech are also ignored. Therefore, this paper proposes a feature enhancement algorithm for dysarthria speech called WHFEMD by combining empirical mode decomposition (EMD) and fast Walsh-Hadamard transform (FWHT). In this proposed algorithm, the dysarthria speech undergoes fast Fourier transform first, followed by EMD to obtain intrinsic mode functions (IMFs). Then FWHT is applied to generate new coefficients and extract statistical features as well as enhanced features based on Power Spectral Density and Gammatone Frequency Cepstral Coefficients based on IMFs. Disease classification is conducted using data from UA Speech and TORGO databases, which is further evaluated by using an imbalanced classification algorithm. According to experimental findings, WHFEMD enhanced features are significantly superior to traditional features. After balancing the data with the imbalanced classification algorithm, the identification accuracy rate increased by at least 12.18 percentage. This demonstrates that WHFEMD can more ccomprehensively characterize dysarthria speech while addressing issues related to its non-stationary and non-linear characteristics as well as lack of simultaneous characterization of local pathological information in both vocal folds and vocal tracts.