Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Necessity of FFTKDE grid enclosure with reflected boundaries #92

Closed
mdmould opened this issue May 17, 2021 · 1 comment
Closed

Necessity of FFTKDE grid enclosure with reflected boundaries #92

mdmould opened this issue May 17, 2021 · 1 comment

Comments

@mdmould
Copy link

mdmould commented May 17, 2021

When evaluating the PDF with FFTKDE, the grid bounds must exceed the extrema of the data points. In #6, you mention this is due to your implementation and isn't strictly necessary. What would be required to remove this restriction?

The reason is that evaluating the PDF in higher dimensions, which is already prohibitive, becomes even more so if one also wishes to impose boundary reflections.

Let's say I have N-dimensional data on which I want to impose boundary conditions in all axes. The reflected data will be 2N+1 times the size of the original. To construct the PDF within the boundaries with FFTKDE, one must reflect the axes in each dimension before forming the Cartesian product. If each axis has n points, the resulting grid will have (3n-2)^N points, and this size becomes a problem both for memory and evaluation speed. The resolution of each axis has to be lowered which leads to less accurate PDF predictions (this isn't too bad, because the shape remains accurate and the height can be renormalized with a numerical integration).

If FFTKDE could be evaluated on a grid inside the data extrema, one could fit to the 2N+1 reflected data points but still evaluate the PDF on the n^N non-reflected grid points.

@tommyod
Copy link
Owner

tommyod commented May 18, 2021

I'll give some general comments which I hope can be helpful. Honestly you are kind of on your own here, since (1) I don't have time to work on this and (2) your requirements are a bit exotic, so you'll have to think it through.

If the restriction is removed, the user could unintentionally get bad results. In theory I think it could be removed, but it would make the library less "safe". I think the feature has saved many more users than it has annoyed, so I am in favor of keeping it.

As per your specific problem, I see two solutions:

  • If the bandwidth is small compared to the data (e.g. data on [0, 1] and bw = 0.01), then you don't have to mirror all of the data. If you mirror e.g. 3*bw, then you might get away with much less computation. This might be the simplest solution.
  • If you want mirroring across each low/high threshold in each dimension, then you could fork the library and adapt it to your specific needs. If you (1) set up a grid that is barely outside of the data and (2) use a convolution that wraps around the boundaries, you should be able to accomplish what you want in a computationally efficient way. After all convolutions wrap around by default, which is exactly what you want. You will have to look at grid validation/creation, and make sure that the convolution wraps around. Try this: create a grid, perform linear binning, adapt this code to convolve a kernel with the binned data.

Best of luck to you! I'll close this issue, since I will not work on it. Let me know if you have questions. If someone else wants to implement a MirroredFFTKDE let me know, and I'll re-open the issue.

@tommyod tommyod closed this as completed May 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants