城市环境噪声源时频特征提取方法

洪晓丹; 赵桂英; 张玮晨; 祝文英

doi:10.16300/j.cnki.1000-3630.24123102

城市环境噪声源时频特征提取方法

Time-frequency feature extraction methods for urban environmental noise sources

摘要

摘要: 为实现城市环境噪声精准化、高效化的管控，需要开发一种高性能的环境噪声源自动识别技术。声学特征提取是环境噪声识别的重要基础，其影响着后续识别模型的性能。针对所提出的基于卷积循环神经网络(convolutional recurrent neural network, CRNN)的城市环境噪声源识别模型，探讨了几种流行的时频特征提取方法对模型识别效果的影响，包括线性尺度的短时傅里叶变换(short-time fourier transform, STFT)、2种梅尔(Mel)尺度的STFT，即梅尔时频特征(Mel-STFT)和对数梅尔时频特征(log-mel-scaled STFT，LM-STFT)，以及恒定Q变换(constant-Q transform, CQT)，并进而提出了一种基于LM-STFT特征的CRNN城市环境噪声源识别模型。实验结果表明：(1)特征提取方法影响着环境噪声源识别模型的性能；(2)4种时频特征在所提出的CRNN模型中均表现良好，准确度达到了88.1%以上，验证了它们在城市环境噪声源识别任务上的有效性；(3)结合了滤波器组变换的Mel-STFT、LM-STFT和CQT三种时频特征的表现显著优于线性尺度特征，其准确度均超过了91.3%；(4)Mel尺度的STFT方法表现明显优于CQT。其中，基于LM-STFT特征的CRNN模型的准确度达到了94.0%，在环境噪声源识别任务上表现优越。

Abstract: To achieve the precise and efficient control demands of urban environmental noise, it is necessary to develop a high-performance automatic identification technology for noise sources. Sound feature extraction is an important foundation for environmental noise identification and affects the performance of subsequent identification models. This study compared the impact of several popular time-frequency feature extraction methods on the performance of a proposed Convolutional Recurrent Neural Network (CRNN) identification model for urban environmental noise sources, including linear-scaled Short-Time Fourier Transform (STFT), two types of Mel-scaled STFT: Mel-STFT and Log-Mel-scaled STFT (LM-STFT), as well as Constant-Q Transform (CQT). Subsequently, a CRNN model based on LM-STFT features for urban environmental noise source identification was proposed. The experimental results indicated that: 1) The feature extraction method affects the performance of the environmental noise source identification model; 2) The four types of time-frequency features all performed well in the proposed CRNN model, with an accuracy of over 88.1%, confirming their effectiveness in the urban environmental noise source identification task; 3) The performance of the three time-frequency features that incorporate filter bank transformations—Mel-STFT, LM-STFT, and CQT—is significantly better than that of the linear scale features, with an accuracy exceeding 91.3%; 4) The Mel-scale STFT method performs noticeably better than CQT. Among them, the CRNN model based on LM-STFT features achieved an accuracy of 94.0%, demonstrating its significantly superior performance in the environmental noise source identification task.

HTML全文

参考文献(17)

施引文献

资源附件(0)