高级检索

面向中文短语音的文本无关说话人确认新框架

A new framework for text-independent speaker verification based on Chinese short utterance

  • 摘要: 相较于文本相关说话人确认,文本无关说话人确认由于验证文本内容不受限制,结合语音识别能够有效避免录音欺诈等常见攻击。然而,文本无关说话人确认系统在短语音验证上会出现严重的性能下降。为此,文章首先提出了一种改进的端到端模型,通过长、短语音说话人分类损失增强网络对不同时长语音段的说话人分类识别能力;同时,在嵌入码空间中增大同一说话人的短语音和长语音之间的相似度,减小不同说话人的短语音之间的相似度,增强网络对短语音的特征提取能力。此外,还提出了一种基于注意力机制的验证词选择方法,选择具有高注意力权重的中文词作为系统验证提示词。实验结果表明,文章提出的改进的端到端模型结合softmax预训练使得模型在短测试语音上的等错误率相对降低29%,基于注意力机制的验证词选择方法也能筛选出具有更好识别结果的验证词,二者结合能够有效提升说话人确认系统对于短中文语音的识别性能。

     

    Abstract: The verification word content of text-independent speaker verification is not constrained. Compared with text-dependent speaker verification, text-independent speaker verification can effectively avoid common attacks such as recording fraud when combined with speech recognition. However, text-independent speaker verification systems suffer from severe performance degradation on short verification utterances. For this reason, an improved end-to-end model is proposed in this paper. The speaker classification losses of both long and short utterances are utilized to enhance the network's ability to classify and identify speakers of the speech segments of different durations. Meanwhile, the similarity of short utterances and long utterances belonging to the same speaker is increased in the embedding space, the similarity of short utterances belonging to different speakers is reduced, and the feature extraction capability of the network for short utterances is enhanced. In addition, an attention mechanism-based verification word selection method is proposed. The Chinese words with high attention weights are selected as the verification prompt text of the speaker verification system. The experimental results show that the improved end-toend model combined with softmax pre-training can result in a 29% relative reduction in equal error rate on short test utterances, and the attention mechanism-based verification word selection method can also effectively select verification words with better recognition results. The combination of the two methods can effectively improve the recognition performance of the speaker verification system for short Chinese utterances.

     

/

返回文章
返回