[关闭]
@kpatrick 2019-11-26T19:47:59.000000Z 字数 2039 阅读 64

工作日记 19/11/22

work vivo daily


0. TODO

  1. 阅读论文 [ISCA Speech][2018]Attention-based End-to-End Models for Small-Footprint Keyword Spotting
  2. 使用CNN+GRURCNN结构分类'hi, jovi''小v小v'
  3. *阅读

1. 论文

1.1 Model

此处输入图片的描述


  1. 模型结构:

    • Attention-Based
    • Encoder对出入的序列进行特征抽取
    • 输出层对编码后的特征进行Attention
  2. Attention(待阅读)
  3. KWS结构参数
    SMALL-FOOTPRINT KEYWORD SPOTTING USING DEEP NEURAL NETWORKS
    • The feed-forward DNN model had 3 hidden layers and 64 hidden nodes per layer with rectified linear unit (ReLU) non-linearity.
    • An input window with 15 left frames and 5 right frameswasused.
    • The LSTM and GRU models were built with 2 hidden layers and 64 hidden nodes per layer.
    • For the GRU KWS model, the final GRU layer was followed by a fully connected layer with ReLU non-linearity.
    • There were no stacked frames in the input for the LSTM and GRU models.
    • The smoothing window for Deep KWS was set to 20 frames.
    • We also trained a TDNN-basedacousticmodelusing∼3000hoursofspeechdata to perform frame-level alignment before KWS model training.

1.2 Data

训练集:18W正样本,100W负样本
验证机:1W正样本,5W负样本
测试集:3W正样本,3W负样本

序列:40-channel Mel-filterbank,25ms window,10ms frame shift,filterbank feature was converted to per-channel energy normalized (PCEN) Mel-spectrograms.

1.3 Conclusion

CNN + GRU + Attention


2. CNN + GRU

2.1 二分类

2.1.1 hi, jovi vs 小v

2.1.2 hi, jovi vs 噪声

2.1.3 小v vs 噪声

2.2 三分类

2.2.1 小数据量

2.2.1 噪声数据增多


添加新批注
在作者公开此批注前,只有你和作者可见。
回复批注