基于卷积递归神经网络的声源信号识别与定位

李青; 敖邦乾; 阎昌国; 陈孝玉; 张南庆

doi:10.16300/j.cnki.1000-3630.22102701

基于卷积递归神经网络的声源信号识别与定位

Sound event classification and localization based on convolutional recurrent neural network

摘要

摘要: 为了有效识别声源信号的类别及定位声源位置，结合卷积神经网络强大的特征提取能力和递归神经网络可处理文本间数据的长期依赖性等优点，设计一种双输入双输出结构模型的卷积递归神经网络(convolutional recurrent neural network, CRNN)，用于对声源信号进行识别和定位。首先设计圆形阵列用于接收声源信号，并对收集到的声源信号数据进行扩展等预处理，然后将声源波形信号转换成相位谱和幅度谱，输入到CRNN中进行学习训练，其输出为声源信号识别的分类概率以及声源信号的定位回归坐标位置，具体步骤为：设置分类识别阈值，当检测到的声源信号阈值大于0.5时，定义为活跃事件，然后对活跃事件的声源信号进行定位。最后，使用设计的CRNN模型分别对无干扰单声源、有回声声源以及混合声源三种声源信号进行分类及定位测试，与传统方法相比，分类准确率提高了42.29个百分点，定位精确度提高了14.09个百分点，与其他神经网络相比，在不明显提高算法复杂度的前提下，在分类准确率及定位精确度综合性能方面提高13.61个百分点，同时，设计的网络模型结构还具有较强的鲁棒性，可应用于声源探测等方面。

Abstract: To identify the classification and localization of multiple sound events effectively, a convolutional recurrent neural network (CNN) is proposed, which combines the powerful feature extraction capabilities of the convolutional neural network and the advantage of long-term dependence of text data of the recurrent neural network. Firstly, the sound event signal is received by a circular array and then expanded to increase the universality of the source data. Secondly, the phase spectrum and amplitude spectrum of the sound event are input to the neural network for training and learning. The outputs of the neural network are the classification probability and the regression coordinate position of the sound event. An appropriate threshold is set to judge if the sound event is an active event, and the sound event localization is carried on only when the threshold value of the sound event classification is greater than 0.5, i.e. it is defined as an active event. Finally, three kinds of classification and localization tests are carried out for the convolutional recurrent neural network, and the results prove that the proposed algorithm has high classification and localization performance and strong robustness for the single sound event without interference, echo sound event and mixed sound event. Compared with traditional methods, the classification accuracy is improved by more than 42.29 percentage points and the positioning accuracy is improved by 14.09 percentage points. Compared with other neural neworks, the classification accuracy and positioning accuracy are improved by 13.61 percentage points without significantly increasing the complexity of the algorithm. At the same time, the designed network structure also has the advantage of strong robustness and could be used on the detection of sound event.

HTML全文

参考文献(19)

施引文献

资源附件(0)