@wjcper2008
2017-10-10T23:14:04.000000Z
字数 2136
阅读 1564
迁移学习
It might help to give slightly more of an overview of MMD.
In general, MMD is defined by the idea of representing distances between distributions as distances between mean embeddings of features.
That is, say we have distributions and over a set . The MMD is defined by a feature map , where is what's called a reproducing kernel Hilbert space. In general, the MMD is
As one example (linear case), we might have and . In that case:
We have and , with , where is a matrix. So we have
Dimension reduction loss information: If or the mapping otherwise isn't invertible, then this MMD is weaker than the previous one: it doesn't distinguish between some distributions that the previous one does.
You can also construct stronger distances. For example, if and you use , then the MMD becomes , and can distinguish not only distributions with different means but with different variances as well.
And you can get much stronger than that: if maps to a general reproducing kernel Hilbert space, then you can apply the kernel trick to compute the MMD, and it turns out that many kernels, including the Gaussian kernel, lead to the MMD being zero if and only the distributions are identical.