您的位置：首页 > 其它

语音识别-关键词检测

2015-11-29 20:10 399 查看

introduction

word-spotting,audio indexing,spoken term detection

输出的是word lattice，根据lattice计算关键词的后验概率。

ATWV=mean(Ncorrect(s)Ntrue(s)−βNspurious(s)T−Ntrue(s))ATWV=mean(\frac{N_{correct}(s)}{N_{true}(s)}-\beta\frac{N_{spurious}(s)}{T-N_{true}(s)})

其中Ncorrect(s)N_{correct}(s)表示检测正确的个数，Ntrue(s)N_{true}(s)表示reference中关键词的个数，Nspurious(s)N_{spurious}(s)检测错误的个数，T表示音频的秒数。β\beta在evaluation中一般设置为999.9。

检测系统共有四部分：

1. speech-to-text engine

输出lattice和single-best phonetic transcripts

2. indexer

The indexer takes these as input and creates an index containing a precomputed list of candidate detection records for each word in the speech-to-text lexicon. The index also contains the phonetic

transcripts to accommodate out-of-vocabulary search terms.

3. detector

The detector loads the index and processes a list of search terms, generating a sorted, scored list of detection records for each term.

4. decider

the decider takes the lists of candidate detections and the cost parameter β and sets a per-term score threshold for making yes/no decisions.

systerm

recognition

对于离线的大量语音数据，首先进行分段，然后使用通用语音识别系统对语音进行解码，获得lattice（边上包含有声学得分和语言得分）。

如果直接根据识别结果进行关键词检测，将会导致更多的漏报情况，因为同音词的存在。

indexing

建索引。假设lattice中出现的所有候选词分别是w1,w2,...,wLw_1,w_2,...,w_L.

1. 首先计算每一个出现在lattice里面词wiw_i的后验概率。根据lattice中包含有的似然得分信息。

2. 对同一时间段出现的相同词wiw_i的后验概率累加作为最后的得分.

3. 使用LL个独立的链表对所有lattice的词wiw_i进行汇总，按照后验概率从大到小的顺序。

detection

单个词：直接根据索引查询即可。

多个词：首先查询单个词，然后根据正确的词顺序和较短的时间间隔进行过滤。

decision

p>NtrueT/β+β−1βNtruep>\frac{N_{true}}{T/\beta+\frac{\beta-1}{\beta}N_{true}}

其中β−1β≈1\frac{\beta-1}{\beta}\approx1，对于三小时的语音T/β≈10T/\beta\approx10，Ntrue(wi)N_{true}(w_i)未知，使用所有候选wiw_i的后验概率和，乘以一个term无关的系数。

problems

out-of-vocabulary word

online systerm for real-time task

reference

rapid and accurate spoken term detection

kaldi keyword search code

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航