@kpatrick
2019-11-26T19:47:59.000000Z
字数 2039
阅读 64
work
vivo
daily
CNN+GRU
的RCNN
结构分类'hi, jovi'
和'小v小v'
SMALL-FOOTPRINT KEYWORD SPOTTING USING DEEP NEURAL NETWORKS
- The feed-forward DNN model had 3 hidden layers and 64 hidden nodes per layer with rectified linear unit (ReLU) non-linearity.
- An input window with 15 left frames and 5 right frameswasused.
- The LSTM and GRU models were built with 2 hidden layers and 64 hidden nodes per layer.
- For the GRU KWS model, the final GRU layer was followed by a fully connected layer with ReLU non-linearity.
- There were no stacked frames in the input for the LSTM and GRU models.
- The smoothing window for Deep KWS was set to 20 frames.
- We also trained a TDNN-basedacousticmodelusing∼3000hoursofspeechdata to perform frame-level alignment before KWS model training.
训练集:18W正样本,100W负样本
验证机:1W正样本,5W负样本
测试集:3W正样本,3W负样本序列:40-channel Mel-filterbank,25ms window,10ms frame shift,filterbank feature was converted to per-channel energy normalized (PCEN) Mel-spectrograms.
CNN + GRU + Attention
/home/vivoadmin/work/project/training/trigger_word/models/cnn-gru-2-class/saved-model-100-0.9957.h5.jovi-Vs-noise
/home/vivoadmin/work/project/training/trigger_word/models/cnn-gru-2-class/saved-model-100-0.9927.h5.xiaov-Vs-noise
/home/vivoadmin/work/project/training/trigger_word/models/cnn-gru-3-class/saved-model-2000-0.9056.h5.2w2w1w
/home/vivoadmin/work/project/training/trigger_word/models/cnn-gru-3-class/saved-model-2000-0.9388.h5.2w2w4w