Question about MMD implementaton #2

hiwonjoon · 2018-05-03T20:30:51Z

Thanks for sharing a code with your amazing paper! I really enjoyed reading it.

Anyway, I am interested in extending your work in other direction, and I come up with a question on MMD part. I was able to understand the overall concept, but not sure on this multi-scale part.

wae/wae.py

Line 294 in 068a257

for scale in [.1, .2, .5, 1., 2., 5., 10.]:

Are you just trying multiple kernels to get a better estimate of MMD?

It would be also very nice of you to recommend some readings to get a better understanding of MMDS.

tolstikhin · 2018-05-04T09:00:33Z

Dear Wonjoon,

thank you for asking. The property we are using here is that the sum of positive definite kernels is also a positive definite kernel. We were initially using IMQ kernel with one fixed width parameter, but noticed it works slightly better if you sum those kernels with a range of widths, which allows the kernel to simultaneously "look at various scales". This is a bit hand-wavy, but I hope it gives you a correct intuition.

Regarding MMDs in general, I can recommend you looking into this overview https://arxiv.org/pdf/1605.09522.pdf

Best wishes,
Ilya

hiwonjoon · 2018-05-05T15:15:55Z

Thanks for an instant response! So, a sigma of the kernel is not related to a prior distribution's sigma. Is it correct?

tolstikhin · 2018-05-05T15:19:20Z

Correct, these are two different things. But you may want to choose the kernel width depending on your prior.

ttgump · 2019-02-08T05:02:33Z

Thanks to the great discussion. I have a question. When I am using the MMD penalty, I trained my WAE model on some other datasets (not MNIST or celebA), I saw the MMD would become a negative value after training hundreds of epochs. Is it possible to have negative MMD penalty?

tolstikhin · 2019-02-08T08:48:32Z

The penalty used in WAE-MMD is not precisely the population MMD, but a sample-based U-statistic. Being an unbiased statistic (that is, its expected value coincides with the quantity of interest --- MMD in this case), if the population MMD is zero, it necessarily needs to take negative values from time to time. In summary, yes, negative values are OK.

ttgump · 2019-02-08T16:34:54Z

Thanks for the explanation! Should we consider the MMD has been converged after meet negative values? So when MMD is negative, can we consider q(z|x) is equal to the prior p(z)?

tolstikhin · 2019-02-17T19:33:56Z

Dear ttfump,

q(z|x) is not being matched to p(z) in WAE. Instead, the aggregate posterior is.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about MMD implementaton #2

Question about MMD implementaton #2

hiwonjoon commented May 3, 2018

tolstikhin commented May 4, 2018

hiwonjoon commented May 5, 2018

tolstikhin commented May 5, 2018

ttgump commented Feb 8, 2019

tolstikhin commented Feb 8, 2019

ttgump commented Feb 8, 2019

tolstikhin commented Feb 17, 2019

Question about MMD implementaton #2

Question about MMD implementaton #2

Comments

hiwonjoon commented May 3, 2018

tolstikhin commented May 4, 2018

hiwonjoon commented May 5, 2018

tolstikhin commented May 5, 2018

ttgump commented Feb 8, 2019

tolstikhin commented Feb 8, 2019

ttgump commented Feb 8, 2019

tolstikhin commented Feb 17, 2019