@kpatrick 2019-08-01T08:38:25.000000Z 字数 1103 阅读 66

How to train custom Word Embdeddings

nlp 笔记

阅读资料：How to train custom Word Embeddings using GPU on AWS

1. Build Corpus 构造语料库

Word2Vec

TensorFlow训练词向量
- CBoW： $context \rightarrow target$
- Skip-Gram：负采样，训练1+k个lr分类，减少了softmax计算代价， $target \rightarrow context$
- GloVe：global vector for word representation
Negative Sampling

word2vec中的subsampling和negative sampling

参考文章中用的是Skip-Gram的方式构造训练集。如果词典中的词汇很多，每次softmax都要计算一个很大的向量，Negative Sampling方法可以每次只训练1+k个classifiers，1个正样本，k个负样本
Loss Function

深度学习库 TensorFlow (TF) 中的候选采样
 Tensorflow基础知识---损失函数详解

tf.nn.sampled_softmax_loss采样一个子集做softmax，减少计算量