@kpatrick 2019-11-26T11:47:59.000000Z 字数 2039 阅读 75

工作日记 19/11/22

work vivo daily

0. TODO

阅读论文 [ISCA Speech][2018]Attention-based End-to-End Models for Small-Footprint Keyword Spotting
使用CNN+GRU的RCNN结构分类'hi, jovi'和'小v小v'
*阅读
- ATTENTION-BASED MODELS FOR TEXT-DEPENDENT SPEAKER VERIFICATION
- SMALL-FOOTPRINT KEYWORD SPOTTING USING DEEP NEURAL NETWORKS

1. 论文

1.1 Model

此处输入图片的描述

模型结构:
- Attention-Based
- Encoder对出入的序列进行特征抽取
- 输出层对编码后的特征进行Attention
Attention(待阅读)
- ATTENTION-BASED MODELS FOR TEXT-DEPENDENT SPEAKER VERIFICATION
KWS结构参数
SMALL-FOOTPRINT KEYWORD SPOTTING USING DEEP NEURAL NETWORKS
- The feed-forward DNN model had 3 hidden layers and 64 hidden nodes per layer with rectiﬁed linear unit (ReLU) non-linearity.
- An input window with 15 left frames and 5 right frameswasused.
- The LSTM and GRU models were built with 2 hidden layers and 64 hidden nodes per layer.
- For the GRU KWS model, the ﬁnal GRU layer was followed by a fully connected layer with ReLU non-linearity.
- There were no stacked frames in the input for the LSTM and GRU models.
- The smoothing window for Deep KWS was set to 20 frames.
- We also trained a TDNN-basedacousticmodelusing∼3000hoursofspeechdata to perform frame-level alignment before KWS model training.

1.2 Data

训练集：18W正样本，100W负样本
验证机：1W正样本，5W负样本
测试集：3W正样本，3W负样本

序列：40-channel Mel-filterbank，25ms window，10ms frame shift，filterbank feature was converted to per-channel energy normalized (PCEN) Mel-spectrograms.

1.3 Conclusion

CNN + GRU + Attention

2. CNN + GRU

2.1 二分类

2.1.1 hi, jovi vs 小v

Training Set: 2W + 2W
Dev Set: 2K + 2K
GRU Acc:
- 84%
- 86%
CNN + GRU Acc:
- 93%
- 91%
结果分析：
CNN具有局部特征抽取的能力，减轻了GRU的记忆压力

2.1.2 hi, jovi vs 噪声

Training Set: 2W + 1W
Dev Set: 2K + 1K
Model: /home/vivoadmin/work/project/training/trigger_word/models/cnn-gru-2-class/saved-model-100-0.9957.h5.jovi-Vs-noise
Acc:
- 99%+
- 99%+

2.1.3 小v vs 噪声

Training Set: 2W + 1W
Dev Set: 2K + 1K
Model: /home/vivoadmin/work/project/training/trigger_word/models/cnn-gru-2-class/saved-model-100-0.9927.h5.xiaov-Vs-noise
Acc:
- 99%+
- 99%+

2.2 三分类

2.2.1 小数据量

Training Set: 2W + 2W + 1W
Dev Set: 2K + 2K + 1K
Model: /home/vivoadmin/work/project/training/trigger_word/models/cnn-gru-3-class/saved-model-2000-0.9056.h5.2w2w1w
Acc:
- 90.6%
- 90.5%

2.2.1 噪声数据增多

Training Set: 2W + 2W + 4W
Dev Set: 2K + 2K + 6K
Model: /home/vivoadmin/work/project/training/trigger_word/models/cnn-gru-3-class/saved-model-2000-0.9388.h5.2w2w4w
Acc:
- 91.0%
- 93.8%

工作日记 19/11/22

0. TODO

1. 论文

1.1 Model

1.2 Data

1.3 Conclusion

2. CNN + GRU

2.1 二分类

2.1.1 hi, jovi vs 小v

2.1.2 hi, jovi vs 噪声

2.1.3 小v vs 噪声

2.2 三分类

2.2.1 小数据量

2.2.1 噪声数据增多

内容目录