Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Point In Face #1056

Draft
wants to merge 54 commits into
base: main
Choose a base branch
from
Draft

Point In Face #1056

wants to merge 54 commits into from

Conversation

aaronzedwick
Copy link
Member

@aaronzedwick aaronzedwick commented Nov 5, 2024

Closes #905

Overview

Expected Usage

from uxarray.grid.geometry import point_in_polygon

# Defined polygon
polygon = [ [-10,  10, -10, 10], [10, 10, -10, -10]]

# Point to check
point = [10, 10]

point_in_polygon(polygon, point, inclusive=True)

PR Checklist

General

  • An issue is linked created and linked
  • Add appropriate labels
  • Filled out Overview and Expected Usage (if applicable) sections

Testing

  • Adequate tests are created if there is new functionality
  • Tests cover all possible logical paths in your function
  • Tests are not too basic (such as simply calling a function and nothing else)

Documentation

  • Docstrings have been added to all new functions
  • Docstrings have updated with any function changes
  • Internal functions have a preceding underscore (_) and have been added to docs/internal_api/index.rst
  • User functions have been added to docs/user_api/index.rst

@aaronzedwick
Copy link
Member Author

@ philipc2 do you think this should be an internal function or exposed to the user?

@aaronzedwick aaronzedwick changed the title DRAFT: Point In Polygon Point In Polygon Nov 27, 2024
@aaronzedwick aaronzedwick marked this pull request as ready for review November 27, 2024 15:13
Copy link
Member

@philipc2 philipc2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please include an ASV benchmark. I'd suggest doing a parameterized benchmark for the 120 and 480 km MPAS grids.

Copy link
Member

@philipc2 philipc2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the optimized functions from #1072 and try to write the function entirely in Numba. This may require us to pass in both the cartesian and spherical versions of point & polygon. Let me know if you have any questions!

uxarray/grid/geometry.py Outdated Show resolved Hide resolved
@aaronzedwick
Copy link
Member Author

@philipc2 do you know why this is failing? It doesn't have any issues locally when I run the tests, but for some reason, it fails on the CI, saying that it can't resize the array. Could you checkout the branch and try running the test cases?

Also, another odd problem is that half of the time the tests do fail locally, saying no faces were found in the subset, but the other half of the time faces are found. I am pretty confused by that as well. So any insight into that would be appreciated if you have any ideas.

@hongyuchen1030
Copy link
Contributor

@philipc2 do you know why this is failing? It doesn't have any issues locally when I run the tests, but for some reason, it fails on the CI, saying that it can't resize the array. Could you checkout the branch and try running the test cases?

Also, another odd problem is that half of the time the tests do fail locally, saying no faces were found in the subset, but the other half of the time faces are found. I am pretty confused by that as well. So any insight into that would be appreciated if you have any ideas.

Can you point us to the specific testcase that's been failing?

Also, another odd problem is that half of the time the tests do fail locally, saying no faces were found in the subset, but the other half of the time faces are found

Are you using the same setting? Did you clear the cache everytime you run it?

@aaronzedwick
Copy link
Member Author

@philipc2 do you know why this is failing? It doesn't have any issues locally when I run the tests, but for some reason, it fails on the CI, saying that it can't resize the array. Could you checkout the branch and try running the test cases?
Also, another odd problem is that half of the time the tests do fail locally, saying no faces were found in the subset, but the other half of the time faces are found. I am pretty confused by that as well. So any insight into that would be appreciated if you have any ideas.

Can you point us to the specific testcase that's been failing?

Also, another odd problem is that half of the time the tests do fail locally, saying no faces were found in the subset, but the other half of the time faces are found

Are you using the same setting? Did you clear the cache everytime you run it?

The test cases that are failing are just the ones on the CI, which are listed under it if you click there, they are the two test cases I added for the point containing polygon grid function.

Everything should be the same across environments, that I know of. That’s why I wanted to see if it passed or failed on someone else’s machine, to hopefully point me in some direction of what the error could be.

@philipc2
Copy link
Member

@philipc2 do you know why this is failing? It doesn't have any issues locally when I run the tests, but for some reason, it fails on the CI, saying that it can't resize the array. Could you checkout the branch and try running the test cases?
Also, another odd problem is that half of the time the tests do fail locally, saying no faces were found in the subset, but the other half of the time faces are found. I am pretty confused by that as well. So any insight into that would be appreciated if you have any ideas.

Can you point us to the specific testcase that's been failing?

Also, another odd problem is that half of the time the tests do fail locally, saying no faces were found in the subset, but the other half of the time faces are found

Are you using the same setting? Did you clear the cache everytime you run it?

The test cases that are failing are just the ones on the CI, which are listed under it if you click there, they are the two test cases I added for the point containing polygon grid function.

Everything should be the same across environments, that I know of. That’s why I wanted to see if it passed or failed on someone else’s machine, to hopefully point me in some direction of what the error could be.

Can you share your environment information? (Python Version & Installed packages)

@philipc2
Copy link
Member

The tests are failing on my machine as well.

@hongyuchen1030
Copy link
Contributor

The test cases that are failing are just the ones on the CI, which are listed under it if you click there, they are the two test cases I added for the point containing polygon grid function.

The test cases that are failing are just the ones on the CI, which are listed under it if you click there, they are the two test cases I added for the point containing polygon grid function.

I didn't see any no faces found on the CI. My point is: does your local fails at the same point as your CI?

face_edge_cartesian = _get_cartesian_face_edge_nodes(
subset.face_node_connectivity.values,
subset.n_face,
subset.n_max_face_edges,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to be the issue here.

If you pass in subset.n_max_face_nodes it works correctly (both values are equivalent).

When the connectivity construction for self.face_edge_connectivity, it leads to issues. I am looking deeper into it.

Copy link
Member Author

@aaronzedwick aaronzedwick Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, okay. So something with constructing the face_edge causes issues. Thanks for the fix for now, let me know if you figure out the issue. From what I saw, it seems to have something to do with the inverse indices, it uses the wrong ones, the ones gotten from the subset, instead of the ones stored inside grid._ds["edge_node_connectivity"].attrs. Maybe it is getting overwritten somehow?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inverse indices in the context of edge_node_connectivity are different than the inverse_indices from the subset. I don't believe that they are overwritten, but something is definitely off.

@aaronzedwick
Copy link
Member Author

The test cases that are failing are just the ones on the CI, which are listed under it if you click there, they are the two test cases I added for the point containing polygon grid function.

The test cases that are failing are just the ones on the CI, which are listed under it if you click there, they are the two test cases I added for the point containing polygon grid function.

I didn't see any no faces found on the CI. My point is: does your local fails at the same point as your CI?

Ah, I see what you mean, no, my local is failing at a different point. But on the same test cases. I will try and reset my environment to make sure it's up to date.


assert len(grid.get_faces_containing_point(point_xyz=point_xyz)) == 2

# For a node three faces should be found
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not always true, a node can have 1, 2, 3 or higher number of faces.
For 1, consider a node on the boundary, not on any other face.

For 2, consider a node on the boundary that is the actual node on the face, on both faces.

For 4/higher, consider a node that is the intersection of 4 faces or higher.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is true, in general. However this is a MPAS grid. Which is always going to have each node connected to 3 faces. So I think it’s an OK assumption here that it should result in 3 faces being found.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is true, in general. However this is a MPAS grid. Which is always going to have each node connected to 3 faces. So I think it’s an OK assumption here that it should result in 3 faces being found.

Ok, you can keep it, but if the grid has holes and the node is along the boundary. I think, it'd be good to test these edge cases.

@aaronzedwick aaronzedwick added the run-benchmark Run ASV benchmark workflow label Jan 14, 2025
Copy link

github-actions bot commented Jan 14, 2025

ASV Benchmarking

Benchmark Comparison Results

Benchmarks that have improved:

Change Before [4fa1f7a] After [4416dab] Ratio Benchmark (Parameter)
- 448M 403M 0.9 face_bounds.FaceBounds.peakmem_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/ugrid/geoflow-small/grid.nc'))
- 638M 394M 0.62 face_bounds.FaceBounds.peakmem_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/ugrid/quad-hexagon/grid.nc'))
- 486M 386M 0.79 mpas_ocean.Integrate.peakmem_integrate('480km')

Benchmarks that have stayed the same:

Change Before [4fa1f7a] After [4416dab] Ratio Benchmark (Parameter)
400M 400M 1.00 face_bounds.FaceBounds.peakmem_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/mpas/QU/oQU480.231010.nc'))
430M 433M 1.01 face_bounds.FaceBounds.peakmem_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/scrip/outCSne8/outCSne8.nc'))
19.1±0.2ms 19.2±0.2ms 1.01 face_bounds.FaceBounds.time_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/mpas/QU/oQU480.231010.nc'))
7.55±0.08ms 7.98±0.3ms 1.06 face_bounds.FaceBounds.time_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/scrip/outCSne8/outCSne8.nc'))
43.7±0.07ms 44.2±0.4ms 1.01 face_bounds.FaceBounds.time_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/ugrid/geoflow-small/grid.nc'))
3.81±0.1ms 3.98±0.09ms 1.04 face_bounds.FaceBounds.time_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/ugrid/quad-hexagon/grid.nc'))
2.98±0.02s 2.94±0.03s 0.99 import.Imports.timeraw_import_uxarray
675±8μs 672±3μs 1.00 mpas_ocean.CheckNorm.time_check_norm('120km')
452±9μs 442±1μs 0.98 mpas_ocean.CheckNorm.time_check_norm('480km')
662±5ms 668±5ms 1.01 mpas_ocean.ConnectivityConstruction.time_face_face_connectivity('120km')
41.6±0.2ms 42.9±0.3ms 1.03 mpas_ocean.ConnectivityConstruction.time_face_face_connectivity('480km')
582±10μs 596±10μs 1.02 mpas_ocean.ConnectivityConstruction.time_n_nodes_per_face('480km')
6.01±0.09ms 6.04±0.1ms 1.01 mpas_ocean.ConstructFaceLatLon.time_cartesian_averaging('120km')
3.68±0.08ms 3.67±0.09ms 1.00 mpas_ocean.ConstructFaceLatLon.time_cartesian_averaging('480km')
3.62±0.05s 3.51±0.01s 0.97 mpas_ocean.ConstructFaceLatLon.time_welzl('120km')
229±2ms 222±0.6ms 0.97 mpas_ocean.ConstructFaceLatLon.time_welzl('480km')
1.17±0μs 1.25±0μs 1.07 mpas_ocean.ConstructTreeStructures.time_ball_tree('120km')
322±10ns 297±4ns 0.92 mpas_ocean.ConstructTreeStructures.time_ball_tree('480km')
793±2ns 784±2ns 0.99 mpas_ocean.ConstructTreeStructures.time_kd_tree('120km')
291±5ns 280±3ns 0.96 mpas_ocean.ConstructTreeStructures.time_kd_tree('480km')
439±6ms 445±9ms 1.01 mpas_ocean.CrossSection.time_const_lat('120km', 1)
222±0.9ms 226±2ms 1.01 mpas_ocean.CrossSection.time_const_lat('120km', 2)
114±1ms 117±1ms 1.02 mpas_ocean.CrossSection.time_const_lat('120km', 4)
362±3ms 364±4ms 1.00 mpas_ocean.CrossSection.time_const_lat('480km', 1)
181±2ms 180±1ms 1.00 mpas_ocean.CrossSection.time_const_lat('480km', 2)
95.4±0.6ms 94.5±1ms 0.99 mpas_ocean.CrossSection.time_const_lat('480km', 4)
121±0.6ms 122±0.4ms 1.01 mpas_ocean.DualMesh.time_dual_mesh_construction('120km')
8.62±0.1ms 8.53±0.2ms 0.99 mpas_ocean.DualMesh.time_dual_mesh_construction('480km')
1.07±0.01s 1.06±0.01s 0.99 mpas_ocean.GeoDataFrame.time_to_geodataframe('120km', False)
53.5±0.7ms 53.0±0.7ms 0.99 mpas_ocean.GeoDataFrame.time_to_geodataframe('120km', True)
85.2±0.2ms 85.9±1ms 1.01 mpas_ocean.GeoDataFrame.time_to_geodataframe('480km', False)
5.50±0.2ms 5.52±0.06ms 1.00 mpas_ocean.GeoDataFrame.time_to_geodataframe('480km', True)
339M 341M 1.01 mpas_ocean.Gradient.peakmem_gradient('120km')
316M 316M 1.00 mpas_ocean.Gradient.peakmem_gradient('480km')
2.75±0.02ms 2.75±0.03ms 1.00 mpas_ocean.Gradient.time_gradient('120km')
316±1μs 310±3μs 0.98 mpas_ocean.Gradient.time_gradient('480km')
231±8μs 241±2μs 1.04 mpas_ocean.HoleEdgeIndices.time_construct_hole_edge_indices('120km')
120±2μs 118±4μs 0.98 mpas_ocean.HoleEdgeIndices.time_construct_hole_edge_indices('480km')
402M 402M 1.00 mpas_ocean.Integrate.peakmem_integrate('120km')
176±6ms 171±2ms 0.97 mpas_ocean.Integrate.time_integrate('120km')
11.7±0.05ms 11.6±0.08ms 0.99 mpas_ocean.Integrate.time_integrate('480km')
351±3ms 348±2ms 0.99 mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('120km', 'exclude')
355±2ms 345±2ms 0.97 mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('120km', 'include')
351±1ms 348±2ms 0.99 mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('120km', 'split')
23.3±0.3ms 23.1±0.3ms 0.99 mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('480km', 'exclude')
23.0±0.4ms 23.3±0.3ms 1.01 mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('480km', 'include')
23.1±0.3ms 23.0±0.06ms 1.00 mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('480km', 'split')
failed failed n/a mpas_ocean.PointInPolygon.time_whole_grid('120km')
failed failed n/a mpas_ocean.PointInPolygon.time_whole_grid('480km')
55.3±0.1ms 55.4±0.2ms 1.00 mpas_ocean.RemapDownsample.time_inverse_distance_weighted_remapping
45.0±0.2ms 45.4±0.2ms 1.01 mpas_ocean.RemapDownsample.time_nearest_neighbor_remapping
355±0.7ms 355±1ms 1.00 mpas_ocean.RemapUpsample.time_inverse_distance_weighted_remapping
260±0.5ms 262±0.9ms 1.01 mpas_ocean.RemapUpsample.time_nearest_neighbor_remapping
312M 314M 1.01 quad_hexagon.QuadHexagon.peakmem_open_dataset
312M 311M 1.00 quad_hexagon.QuadHexagon.peakmem_open_grid
6.20±0.09ms 6.23±0.08ms 1.00 quad_hexagon.QuadHexagon.time_open_dataset
5.28±0.04ms 5.28±0.08ms 1.00 quad_hexagon.QuadHexagon.time_open_grid

Benchmarks that have got worse:

Change Before [4fa1f7a] After [4416dab] Ratio Benchmark (Parameter)
+ 1.72±0.03ms 1.93±0.09ms 1.12 mpas_ocean.ConnectivityConstruction.time_n_nodes_per_face('120km')

@aaronzedwick aaronzedwick added run-benchmark Run ASV benchmark workflow and removed run-benchmark Run ASV benchmark workflow labels Jan 14, 2025
Comment on lines 2443 to 2451
# Get the face's edges for the whole subset
face_edge_cartesian = _get_cartesian_face_edge_nodes(
subset.face_node_connectivity.values,
subset.n_face,
subset.n_max_face_nodes,
subset.node_x.values,
subset.node_y.values,
subset.node_z.values,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very expensive to re-compute each time we call this function. Consider adding an internal property or storing this in Grid._ds like we do with our other methods.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do that, but it is a new subset each time the operation is run. So won't it be recomputed anyway? It should only be a face or a few at max for each subset.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A more efficient approach would be to store it in the original Grid, where it can be computed once, and then once we subset, it should return a Grid with the face_edge_nodes_xyx appropriately indexed.

Though, if it is only a few faces at most, the perfromance may be negligible, but we need to consider the cases where we may be running millions of point in polygon queries for higher-resolution grids.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, okay yeah that would work. I didn't realize it would keep it through the subset operation.

return index


def get_max_face_radius(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be _populate_max_face_radius similar to our other population methods.

Comment on lines 188 to 192
class PointInPolygon(GridBenchmark):
def time_whole_grid(self, resolution):
point_xyz = np.array([self.uxgrid.face_x[0].values, self.uxgrid.face_y[0].values, self.uxgrid.face_z[0].values])

self.uxgrid.get_faces_containing_point(point_xyz=point_xyz)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the GH Actions report, it appears that a single Point in Face query is taking approximately 27.0±0.1ms

Ideally, we want to get this number significantly down.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did some digging:

Please include the following in a custom setup function:

_ = uxds.uxgrid.face_edge_connectivity

We don't want this to be included in the timing. I believe much of the 27ms is attributed to that.

This is for a 30km Grid

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, consider including a single sample point query in the setup, since there is also some overhead with the KD tree construction.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the performance I get for a 30km grid after doing the things mentioned above. pretty good for 30km grids

image

Only issue, is that it doesn't find any faces.

Here is my sample code I used.

import uxarray as ux
import numpy as np

import cProfile

profiler = cProfile.Profile()

grid_path = "/Users/philipc/PycharmProjects/uxarray/unstructured-grid-viz-cookbook/meshfiles/x1.655362.grid.nc"
data_path = "/Users/philipc/PycharmProjects/uxarray/unstructured-grid-viz-cookbook/meshfiles/x1.655362.data.nc"

uxds = ux.open_dataset(grid_path, data_path)
uxds.uxgrid.normalize_cartesian_coordinates()

_ = uxds.uxgrid.face_edge_connectivity

point = np.array([0.0, 0.0, 1.0])
res = uxds.uxgrid.get_faces_containing_point(point)


profiler.enable()
res = uxds.uxgrid.get_faces_containing_point(point)
profiler.disable()

profiler.dump_stats('pface.prof')

print(res)

Can do snakeviz pface.prof to view the profiler.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this for me! I am trying to implement these changes now. In terms of it not finding any faces, does the mesh have holes in it? There may not be a face at the pole possibly?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The grid is a global MPAS atmosphere grid with no holes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may have been on an older version of the code. Just ran it again after pulling and it works. A single face is detected. Going to run a few more tests.

Copy link
Member Author

@aaronzedwick aaronzedwick Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it working locally for you? It does for me, but the CI is failing. Not sure why.

@aaronzedwick aaronzedwick added run-benchmark Run ASV benchmark workflow and removed run-benchmark Run ASV benchmark workflow labels Jan 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
run-benchmark Run ASV benchmark workflow
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Point in Face
4 participants