@kpatrick 2019-08-29T08:58:44.000000Z 字数 820 阅读 68

时间序列

work

1. 时间线

19/06: 定下方向 - nlp
19/06 - 19/07:
- 继续nlp基础，复习Coursera，课程项目(Text Generator)，重新理解seq2seq
- 学习LM, Embeddings，笔记
- LSTM + Attention, CNN, BERT开始做简单项目练手，感性-理性
- 翻译知识储备，语料库收集
19/08 - current：
- Attention模型: 语料预处理，word-piece分词，模型训练
- Transformer模型: 语料预处理，moses tokenizer，jieba cut，bpe，模型训练
- Api化：模型部署，nematus代码贡献

2. 技术栈

2.1 Deep Learning

CNN
RNN
seq2seq
Encoder-Decoder
Attention
Training & Tuning

2.2 NLP

Tokenize: Bpe, jieba, moses
LM: skip-gram, cBow
Word Embedding
Attention
Transformer
Bert

2.2 Toolkit

Python
Tensorflow
CUDA GPU
Linux, Shell
Jieba
nematus
WMT

3. Key Result

3.1 API

Translation Api

3.2 Test Case

例句：
["yeah", ",", "I", "'m", "an", "engineer", "working", "on", "artificial", "intelligence", "algorithms", "."]，12个token
数量：500
硬件：gtx 1080 Ti gpu * 1
速度：110 sents/sec, 1200 words/sec(approximate)
Translated 500 sents in 4.594204902648926 sec. Speed 108.83276009559567 sents/sec

4. Support

Texts Process
Embeddings
LM
seq2seq
Bert
语音相关
OCR

内容目录

添加新批注

在作者公开此批注前，只有你和作者可见。

私有
公开
删除

回复批注