Necessity of FFTKDE grid enclosure with reflected boundaries #92

mdmould · 2021-05-17T12:48:22Z

When evaluating the PDF with FFTKDE, the grid bounds must exceed the extrema of the data points. In #6, you mention this is due to your implementation and isn't strictly necessary. What would be required to remove this restriction?

The reason is that evaluating the PDF in higher dimensions, which is already prohibitive, becomes even more so if one also wishes to impose boundary reflections.

Let's say I have N-dimensional data on which I want to impose boundary conditions in all axes. The reflected data will be 2N+1 times the size of the original. To construct the PDF within the boundaries with FFTKDE, one must reflect the axes in each dimension before forming the Cartesian product. If each axis has n points, the resulting grid will have (3n-2)^N points, and this size becomes a problem both for memory and evaluation speed. The resolution of each axis has to be lowered which leads to less accurate PDF predictions (this isn't too bad, because the shape remains accurate and the height can be renormalized with a numerical integration).

If FFTKDE could be evaluated on a grid inside the data extrema, one could fit to the 2N+1 reflected data points but still evaluate the PDF on the n^N non-reflected grid points.

tommyod · 2021-05-18T11:20:31Z

I'll give some general comments which I hope can be helpful. Honestly you are kind of on your own here, since (1) I don't have time to work on this and (2) your requirements are a bit exotic, so you'll have to think it through.

If the restriction is removed, the user could unintentionally get bad results. In theory I think it could be removed, but it would make the library less "safe". I think the feature has saved many more users than it has annoyed, so I am in favor of keeping it.

As per your specific problem, I see two solutions:

If the bandwidth is small compared to the data (e.g. data on [0, 1] and bw = 0.01), then you don't have to mirror all of the data. If you mirror e.g. 3*bw, then you might get away with much less computation. This might be the simplest solution.
If you want mirroring across each low/high threshold in each dimension, then you could fork the library and adapt it to your specific needs. If you (1) set up a grid that is barely outside of the data and (2) use a convolution that wraps around the boundaries, you should be able to accomplish what you want in a computationally efficient way. After all convolutions wrap around by default, which is exactly what you want. You will have to look at grid validation/creation, and make sure that the convolution wraps around. Try this: create a grid, perform linear binning, adapt this code to convolve a kernel with the binned data.

Best of luck to you! I'll close this issue, since I will not work on it. Let me know if you have questions. If someone else wants to implement a MirroredFFTKDE let me know, and I'll re-open the issue.

tommyod closed this as completed May 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Necessity of FFTKDE grid enclosure with reflected boundaries #92

Necessity of FFTKDE grid enclosure with reflected boundaries #92

mdmould commented May 17, 2021

tommyod commented May 18, 2021 •

edited

Loading

Necessity of FFTKDE grid enclosure with reflected boundaries #92

Necessity of FFTKDE grid enclosure with reflected boundaries #92

Comments

mdmould commented May 17, 2021

tommyod commented May 18, 2021 • edited Loading

tommyod commented May 18, 2021 •

edited

Loading