@kpatrick
2019-08-29T16:58:44.000000Z
字数 820
阅读 57
时间序列
work
1. 时间线
- 19/06: 定下方向 - nlp
- 19/06 - 19/07:
- 继续nlp基础,复习Coursera,课程项目(Text Generator),重新理解seq2seq
- 学习LM, Embeddings,笔记
- LSTM + Attention, CNN, BERT开始做简单项目练手,感性-理性
- 翻译知识储备,语料库收集
- 19/08 - current:
- Attention模型: 语料预处理,word-piece分词,模型训练
- Transformer模型: 语料预处理,moses tokenizer,jieba cut,bpe,模型训练
- Api化:模型部署,nematus代码贡献
2. 技术栈
2.1 Deep Learning
- CNN
- RNN
- seq2seq
- Encoder-Decoder
- Attention
- Training & Tuning
2.2 NLP
- Tokenize: Bpe, jieba, moses
- LM: skip-gram, cBow
- Word Embedding
- Attention
- Transformer
- Bert
- Python
- Tensorflow
- CUDA GPU
- Linux, Shell
- Jieba
- nematus
- WMT
3. Key Result
3.1 API
3.2 Test Case
- 例句:
["yeah", ",", "I", "'m", "an", "engineer", "working", "on", "artificial", "intelligence", "algorithms", "."],12个token
- 数量:500
- 硬件:gtx 1080 Ti gpu * 1
- 速度:110 sents/sec, 1200 words/sec(approximate)
Translated 500 sents in 4.594204902648926 sec. Speed 108.83276009559567 sents/sec
4. Support
- Texts Process
- Embeddings
- LM
- seq2seq
- Bert
- 语音相关
- OCR