[关闭]
@wjcper2008 2017-10-11T07:14:04.000000Z 字数 2136 阅读 1449

Maximum Mean Discrepancy (distance distribution)

迁移学习


It might help to give slightly more of an overview of MMD.

In general, MMD is defined by the idea of representing distances between distributions as distances between mean embeddings of features.

That is, say we have distributions and over a set . The MMD is defined by a feature map , where is what's called a reproducing kernel Hilbert space. In general, the MMD is

Case: linear

As one example (linear case), we might have and . In that case:

Case: linear projection

We have and , with , where is a matrix. So we have

Dimension reduction loss information: If or the mapping otherwise isn't invertible, then this MMD is weaker than the previous one: it doesn't distinguish between some distributions that the previous one does.

Case: variance projection

You can also construct stronger distances. For example, if and you use , then the MMD becomes , and can distinguish not only distributions with different means but with different variances as well.

Summary

And you can get much stronger than that: if maps to a general reproducing kernel Hilbert space, then you can apply the kernel trick to compute the MMD, and it turns out that many kernels, including the Gaussian kernel, lead to the MMD being zero if and only the distributions are identical.

添加新批注
在作者公开此批注前,只有你和作者可见。
回复批注