Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow (unpractical) for interpolating data on a regular grid #41

Open
perrette opened this issue Jan 23, 2025 · 3 comments
Open

Slow (unpractical) for interpolating data on a regular grid #41

perrette opened this issue Jan 23, 2025 · 3 comments

Comments

@perrette
Copy link

perrette commented Jan 23, 2025

This is most likely related to #34 .

I need to regrid data from an irregularly spaced grid onto a regular longitude x latitude grid. Unfortunately on the real-world example below this takes minutes (nearly 10 minutes indeed) instead of seconds in an equivalent python example. This issue could also be posted on DelaunayTriangulation but I write it here keeping in mind other triangulation packages exist NaturalNeighbours.jl could take advantage of.

I have to start with a python example using scipy.interpolate.griddata because that's what I am familiar with and that shows what the performance can be:

# First generate scatter data on a random set of points

import numpy as np
nLon = 320
nLat = 384
shape = nLat, nLon
rng = np.random.default_rng(seed=42)
lon = rng.uniform(low=0, high=360, size=shape)
lat = rng.uniform(low=-90, high=90, size=shape)
data = np.cos(np.deg2rad(lon)) + np.sin(np.deg2rad(lat))

# Interpolate
from scipy.interpolate import griddata

rlon = np.arange(0, 360)
rlat = np.arange(-90, 90)
rdata = griddata((lon.flat, lat.flat), data.flat, (rlon[None, :], rlat[:, None]))

# Plot
import matplotlib.pyplot as plt

f, (ax, ax2) = plt.subplots(1, 2, figsize=(12, 3), sharex=True, sharey=True)

h = ax.scatter(lon, lat, c=data, s=1)
plt.colorbar(h, ax=ax)
ax.set_title("Scattered data")

h = ax2.pcolormesh(rlon, rlat, rdata)
plt.colorbar(h, ax=ax2)
ax2.set_title("Gridded data")

ax.set_xlim(0, 360)
ax.set_ylim(-90, 90)

f.tight_layout()

Image

On my machine the interpolation part of this code (griddata) takes up

CPU times: user 746 ms, sys: 23 ms, total: 769 ms
Wall time: 769 ms

Now the NaturalNeighbours equivalent takes more than 5 minutes and counting:

using Random, Statistics

nLon = 320
nLat = 384
shape = (nLat, nLon)
rng = MersenneTwister(42)

lon = rand(rng, shape...) .* 360.0    # Uniformly distributed in [0, 360)
lat = rand(rng, shape...) .* 180.0 .- 90.0  # Uniformly distributed in [-90, 90)

# Compute data array
deg2rad(x) = x * π / 180
data = cos.(deg2rad.(lon)) .+ sin.(deg2rad.(lat));
using NaturalNeighbours: interpolate

step = 1
rstep = 1

rlon = 0.0:rstep:359.0
rlat = -90.0:rstep:89.0

rlon2 = [ x for x in rlon, y in rlat]
rlat2 = [ y for x in rlon, y in rlat]

# Flatten the input data
lon_flat = vec(lon[1:step:end, 1:step:end])
lat_flat = vec(lat[1:step:end, 1:step:end])
data_flat = vec(data[1:step:end, 1:step:end])

# Use gridded interpolation
@time begin
    itp = interpolate(lon_flat, lat_flat, data_flat)
end
@time begin
    rdata = itp.(rlon2, rlat2)
end

takes about 6 + 7 = 13 minutes:

358.370743 seconds (1.16 M allocations: 391.475 MiB, 0.14% gc time)
417.264291 seconds (21 allocations: 518.609 KiB)
using CairoMakie

# Scatter plot
fig = Figure(size = (1200, 400))

ax1 = Axis(fig[1, 1][1, 1], title = "Scatter plot of Data", xlabel = "Longitude", ylabel = "Latitude")
xlims!(ax1, 0, 360)
ylims!(ax1, -90, 90)

h = scatter!(ax1, vec(lon), vec(lat), color = vec(data), colormap = :viridis, colorrange = extrema(data))
Colorbar(fig[1, 1][1, 2], h)

# Heatmap (gridded data)
ax2 = Axis(fig[1, 2][1, 1], title = "Gridded Data Heatmap", xlabel = "Longitude", ylabel = "Latitude")
hm = heatmap!(ax2, rlon, rlat, rdata, colormap = :viridis, colorrange = extrema(data))
Colorbar(fig[1, 2][1, 2])
xlims!(ax2, 0, 360)
ylims!(ax2, -90, 90)

# Display the figure
fig

Image

The difference is striking and makes its use for interpolation of scattered data on a regular grid unpractical.

Now as noted in #34 it seems to (at least partly) come down to triangulation:

import DelaunayTriangulation

lon_flat = vec(lon)
lat_flat = vec(lat)
data_flat = vec(data)
points = Array(hcat(lon_flat, lat_flat)')

@time begin
    tri = DelaunayTriangulation.triangulate(points)
end
392.031465 seconds (12.46 M allocations: 945.640 MiB, 0.11% gc time, 1.37% compilation time)
Delaunay Triangulation.
   Number of vertices: 122880
   Number of triangles: 245723
   Number of edges: 368602
   Has boundary nodes: false
   Has ghost triangles: true
   Curve-bounded: false
   Weighted: false
   Constrained: false

Using predicates=FastKernel() as suggested in #34 does not yield significant speedup, only less memory usage:

@time begin
    tri = DelaunayTriangulation.triangulate(points, predicates=DelaunayTriangulation.FastKernel())
end
386.701739 seconds (2.40 M allocations: 451.014 MiB, 0.16% gc time, 0.33% compilation time)

Clearly there are faster alternatives to DelaunayTriangulation. For example, I came across Quickhull and it performs a triangulation in less than 1 second on this example:

import Quickhull

@time begin
    tri = Quickhull.delaunay(points)
end
0.561327 seconds (4.74 M allocations: 203.663 MiB, 6.42% gc time)
Hull of 122880 points in 3 dimensions
  - 122880 Hull vertices: Int32[92533, 76914, 45120, 1703, 37100, 90240, 60540, 91393, 59930, 3406  …  73503, 99296, 106903, 7506, 64941, 98971, 49493, 17984, 49648, 120517]
  - 245723 Hull facets: GeometryBasics.TriangleFace{Int32}[TriangleFace(62715, 49280, 98608), TriangleFace(16092, 28223, 88153), TriangleFace(80418, 84756, 51745), TriangleFace(58378, 115147, 11275), TriangleFace(9954, 64670, 71982), TriangleFace(20587, 103259, 14865), TriangleFace(102348, 9377, 75774), TriangleFace(28969, 37086, 40165), TriangleFace(97621, 99325, 60422), TriangleFace(69792, 121048, 71487)  …  TriangleFace(103087, 99102, 102056), TriangleFace(99102, 94216, 102056), TriangleFace(52522, 103087, 89220), TriangleFace(52522, 3949, 89220), TriangleFace(117188, 7010, 1291), TriangleFace(96069, 7010, 37696), TriangleFace(57295, 68207, 37696), TriangleFace(96069, 68207, 37696), TriangleFace(57295, 102056, 7382), TriangleFace(103087, 102056, 7382)]

Perhaps it would be worth considering using one of these faster triangulation approaches?

@DanielVandH
Copy link
Owner

DanielVandH commented Jan 23, 2025

Yeah it's not the fastest atm. I have a branch at https://github.com/JuliaGeometry/DelaunayTriangulation.jl/tree/v2 for v2 of DelaunayTriangulation.jl which will be a lot faster, but it's unfinished until I find the time. Maybe using spatial sorting could help with DelaunayTriangulation.jl, not sure (spatial sorting is also implemented in that branch).

Perhaps it would be worth considering using one of these faster triangulation approaches?

I personally have no intentions of changing the backend. You are free to make a PR allowing for someone to change the backend (meaning a choice between triangulation packages, or a generic interface that each package could implement) and I can review it if you want.

@perrette
Copy link
Author

Ok, thanks Daniel, good to know you are aware of it and working on improvements to DelaunayTriangulation. I had not realized your also are the author of that other package. Otherwise I would not have suggested to switch to another backend. I guess in this case I'll go on with my python workaround (I don't have to go full julia yet) and will be looking forward for you to making progress. I'll save that PR for more pressing needs. Thanks again.

perrette pushed a commit to perrette/testgriddatajulia that referenced this issue Jan 23, 2025
@DanielVandH
Copy link
Owner

No problem. I will ping this issue once I've finished v2 and hopefully got some speed improvements. I'll use the example code you gave in this issue to help as a target for benchmarking during development.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants