• 中国科技论文统计源期刊
  • 全国中文核心期刊
  • 中国科技核心期刊
  • 高级检索

    基于深度学习的双耳声源定位算法研究

    宋昊, 刘雪洁, 俞胜锋, 钟小丽

    宋昊, 刘雪洁, 俞胜锋, 钟小丽. 基于深度学习的双耳声源定位算法研究[J]. 声学技术, 2022, 41(4): 602-607. DOI: 10.16300/j.cnki.1000-3630.2022.04.018
    引用本文: 宋昊, 刘雪洁, 俞胜锋, 钟小丽. 基于深度学习的双耳声源定位算法研究[J]. 声学技术, 2022, 41(4): 602-607. DOI: 10.16300/j.cnki.1000-3630.2022.04.018
    SONG Hao, LIU Xuejie, YU Shengfeng, ZHONG Xiaoli. Binaural localization algorithm based on deep learning[J]. Technical Acoustics, 2022, 41(4): 602-607. DOI: 10.16300/j.cnki.1000-3630.2022.04.018
    Citation: SONG Hao, LIU Xuejie, YU Shengfeng, ZHONG Xiaoli. Binaural localization algorithm based on deep learning[J]. Technical Acoustics, 2022, 41(4): 602-607. DOI: 10.16300/j.cnki.1000-3630.2022.04.018

    基于深度学习的双耳声源定位算法研究

    基金项目: 

    广东省自然科学基金项目(2021A1515011871,2021A1515012630)

    详细信息
      作者简介:

      宋昊(2000-),男,广东广州人,研究方向为信息技术和声信号处理。

      通讯作者:

      钟小丽,E-mail:xlzhong@scut.edu.cn

    • 中图分类号: O429

    Binaural localization algorithm based on deep learning

    • 摘要: 针对多种定位因素存在复杂关联且不易准确提取的问题,提出了以完整双耳声信号作为输入的、基于深度学习的双耳声源定位算法。首先,分别采用深层全连接后向传播神经网络(Deep Back Propagation Neural Network,D-BPNN)和卷积神经网络(Convolutional Neural Network, CNN)实现深度学习框架;然后,分别以水平面 15°、30°和 45°空间角度间隔的双耳声信号进行模型训练;最后,采用前后混乱率、定位准确率与训练时长等指标进行算法有效性分析。模型预测结果表明,CNN模型的前后混乱率远低于 D-BPNN;D-BPNN模型的定位准确率能够达到87%以上,而 CNN模型的定位准确率能够达到 98%左右;在相同实验条件下,CNN模型的训练时长大于 D-BPNN,且随着水平面角度间隔的减小,两者训练时长之间的差异愈发显著。
      Abstract: Due to existence of complicated relationships between multiple localization cues, which causes them hard to be extracted accurately, a deep learning-based binaural sound source localization algorithm with complete binaural sound signals as input is proposed. Firstly, the deep fully connected back propagation neural network (D-BPNN) and the convolutional neural network (CNN) are used to implement the deep learning framework respectively. And then, binaural sound source signals with uniform azimuthal spacing of 15°, 30° and 45° in horizontal plane are applied to model training respectively. Finally, indicators such as front-back confusion rate, localization accuracy and training duration are used to investigate effectiveness of the models. The model prediction results show that the front-back confusion rate of the CNN model is much lower than that of D-BPNN model. The localization accuracy of the DBPNN model can reach more than 87%, while the localization accuracy of the CNN model is about 98%. Under the same experimental conditions, the training time of CNN model is longer than that of D-BPNN model; Moreover, this difference in training time becomes more and more obviously as the azimuthal spacing in the horizontal plane decreases.
    • [1]

      BLAUERT J. Spatial hearing:the psychophysics of human sound localization[M]. Cambridge:MIT Press, 1997.

      [2]

      MIDDLEBROOKS J C. Sound localization[J]. Handbook of Clinical Neurology, 2015, 129:99-116.

      [3]

      ALTMANN C F, UEDA R, BUCHER B, et al. Trading of dynamic interaural time and level difference cues and its effect on the auditory motion-onset response measured with electroencephalography[J]. NeuroImage, 2017, 159:185-194.

      [4]

      BAUMGARTNER R, MAJDAK P, LABACK B. Modeling sound-source localization in sagittal planes for human listeners[J]. The Journal of the Acoustical Society of America, 2014, 136(2):791-802.

      [5]

      LANGENDIJK E H A, BRONKHORST A W. Contribution of spectral cues to human sound localization[J]. The Journal of the Acoustical Society of America, 2002, 112(4):1583-1596.

      [6]

      BIANCO M J, GERSTOFT P, TRAER J, et al. Machine learning in acoustics:theory and applications[J]. The Journal of the Acoustical Society of America, 2019, 146(5):3590.

      [7]

      GILL D, TROYANSKY L, NELKEN I. Auditory localization using direction-dependent spectral information[J]. Neurocomputing, 2000, 32-33:767-773.

      [8]

      CHUNG W, CARLILE S, LEONG P. A performance adequate computational model for auditory localization[J]. The Journal of the Acoustical Society of America, 2000, 107(1):432-445.

      [9]

      JIN C, SCHENKEL M, CARLILE S. Neural system identification model of human sound localization[J]. The Journal of the Acoustical Society of America, 2000, 108(3 Pt 1):1215-1235.

      [10]

      HINTON G E, OSINDERO S, TEH Y W. A fast learning algorithm for deep belief nets[J]. Neural Computation, 2006, 18(7):1527-1554.

      [11] 倪俊帅,赵梅,胡长青.基于深度学习的舰船辐射噪声多特征融合分类[J].声学技术, 2020, 39(3):366-371.

      NI Junshuai, ZHAO Mei, HU Changqing. Multi-feature fusion classification of ship radiated noise based on deep learning[J]. Technical Acoustics, 2020, 39(3):366-371.

      [12] 罗笑雪,柯雨璇,郑成诗,等.联合谱和空间特征的深度学习语音增强研究[J].声学技术, 2019, 38(5):467-468.

      LUO Xiaoxue, KE Yuxua, ZHENG Chengshi, et al. Deep learning-based speech enhancement using both spectral and spatial features[J]. Technical Acoustics, 2019, 38(5):467-468.

      [13] 丁建策,厉剑,彭任华,等.室内两步法监督式学习双耳声源距离估计[J].声学学报, 2019, 44(4):405-416.

      DING Jiance, LI Jian, PENG Renhua, et al. Two-stage supervised binaural distance estimation in room environments[J]. Acta Acustica, 2019, 44(4):405-416.

      [14]

      DING J C, KE Y X, CHENG L J, et al. Joint estimation of binaural distance and azimuth by exploiting deep neural networks[J]. The Journal of the Acoustical Society of America, 2020, 147(4):2625.

      [15]

      GAVIRIA J F, ESCALANTE-PEREZ A, CASTIBLANCO J C, et al. Deep learning-based portable device for audio distress signal recognition in urban areas[J]. Applied Sciences, 2020, 10(21):7448.

      [16]

      GAMPER H, TASHEV I J. Blind reverberation time estimation using a convolutional neural network[C]//2018 16th International Workshop on Acoustic Signal Enhancement (IWAENC). Tokyo, Japan. IEEE, 2018:136-140.

      [17]

      CHEN R, SCHMIDT H. Model-based convolutional neural network approach to underwater source-range estimation[J]. The Journal of the Acoustical Society of America, 2021, 149(1):405-420.

      [18]

      NEILSEN T B, ESCOBAR-AMADO C D, ACREE M C, et al. Learning location and seabed type from a moving mid-frequency source[J]. The Journal of the Acoustical Society of America, 2021, 149(1):692.

      [19]

      GARDNER W G, MARTIN K D. HRTF measurements of a KEMAR[J]. The Journal of the Acoustical Society of America, 1995, 97(6):3907-3908.

      [20] 谢菠荪.头相关传输函数与虚拟听觉[M].北京:国防工业出版社, 2008.
    • 期刊类型引用(2)

      1. 梅鹏程,杨吉斌,张强,黄翔. 一种基于三维卷积的声学事件联合估计方法. 计算机科学. 2023(03): 191-198 . 百度学术
      2. 卢炽华,薛齐凡,刘志恩,朱亚伟,彭文杰,李放. 基于SincNet增强的时延估计声源定位算法研究. 武汉理工大学学报. 2023(10): 127-134 . 百度学术

      其他类型引用(4)

    计量
    • 文章访问数:  848
    • HTML全文浏览量:  0
    • PDF下载量:  1205
    • 被引次数: 6
    出版历程
    • 收稿日期:  2021-02-28
    • 修回日期:  2021-05-03
    • 刊出日期:  2022-08-27

    目录

      /

      返回文章
      返回