Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support bbox in in geometry #138

Merged
merged 17 commits into from
Mar 21, 2024

Conversation

ljstrnadiii
Copy link
Contributor

@ljstrnadiii ljstrnadiii commented Feb 13, 2024

This pr adds support for matching an existing xarray dataset that adheres to rioxarray standards and to support float64 precision with coordinates.

Motivating Use Case of match_xarray:

  • It will generally make it easier to merge datasets that come from somewhere else!
  • e.g. I have a ton of LiDAR transects as their own tifs. Now I can open them in rioxarray and pass them to load data with the correct transform once instead of resampling again to match.

Motivating Use Case of use_coords_double_precision:

  • more precise transforms if matching closely is important to people
  • allows a hacky and not exactly perfect approach to match to an existing xarray's transform if their dims are in float64
  • when using double precision geometry objects to match the geometry of an existing dataset, .getInfo() seems to change the values so slightly! This motivated np.testing.assert_all_close instead of np.testing.assert_equal

🦩

@ljstrnadiii
Copy link
Contributor Author

ljstrnadiii commented Feb 13, 2024

Quick screen shot of how coords are changing before calling .getInfo(), after, and the inputs that come from raster.io.bounds(). I might be missing something here, but they are off just slightly. At least enough to for xr.merge to notice. Actually, I am seeing that if we call .bounds() on a ee.Geometry.Rectangle, we see this discrepancy. I would love to use np.testing.assert_equal instead of the close test if we can think of a way to prevent this or maybe this is an issue worth filing with ee folks.

Screenshot 2024-02-12 at 9 21 25 PM

@ljstrnadiii
Copy link
Contributor Author

@dabhicusp @naschmitz @alxmrs (+others; not sure who to cc) would you all be receptive to something like this? Any thoughts?

@@ -41,17 +41,14 @@
]


def _read_identity_pool_creds() -> identity_pool.Credentials:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to revert to this, but it was easier for me to get tests up and running with google.auth.default, which looks for GOOGLE_APPLICATION_CREDENTIALS by default anyways.

# This is off slightly due to bounds determined by geometry e.g. .getInfo()
# seems to cause a super slight shift in the bounds. Thhe coords change before
# and after the call to .getInfo()!
np.testing.assert_almost_equal(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really want this to pass with assert_equal.

@ljstrnadiii
Copy link
Contributor Author

I removed changes around ee credentials/auth in case that was the issue with the previous workflow run failure.

@ljstrnadiii
Copy link
Contributor Author

I am not sure what's up with failing ci workflows. Looks like it may be a secret issue.

@jdbcode
Copy link
Member

jdbcode commented Feb 14, 2024

Thanks for the PR, Leonard! You can mostly ignore the ci workflows, they'll be run in a different environment. Note that currently we're on a biweekly issue and PR triage cycle. Thanks for your patience as we get this assigned for review!

xee/ext.py Outdated
@@ -40,6 +40,7 @@
from xarray.backends import store as backends_store
from xarray.core import indexing
from xarray.core import utils
import xarray as xr
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Xarray is already imported.

Copy link
Collaborator

@alxmrs alxmrs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use double precision sounds like a useful feature, but I am uncertain about match_xarray. This doesn’t look like a standard part of rasterio — am I mistaken?
I would advise not modifying the base interface to achieve this kind of compatibility (given a review on mobile).

@ljstrnadiii
Copy link
Contributor Author

Use double precision sounds like a useful feature, but I am uncertain about match_xarray. This doesn’t look like a standard part of rasterio — am I mistaken? I would advise not modifying the base interface to achieve this kind of compatibility (given a review on mobile).

  1. I got some inspiration from rioxarray.reproject_match here, which is commonly used to merge datasets.
  2. I am cool with dropping this in favor of double precision support if we can get this test to pass with assert equal not assert close somehow.
  3. I would like a way to specify the exact transform of the Xee dataset without the issues I tried to highlight above--namely that bounds change ever so slightly with the ee.geometry interface before and after getInfo() call. Maybe we can support a tuple[float, float, float, float] type in the geometry as a compromise?
  4. We could also drop the double precision arg and enable behind the scenes if the geometry or scale is float64 somehow?

xee/ext.py Outdated
@@ -1020,6 +1044,13 @@ def open_dataset(
frameworks.
ee_init_kwargs: keywords to pass to Earth Engine Initialize when
attempting to auto init for remote workers.
use_coords_double_precision: Whether to use double precision for coordinates
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should float64 just be the default? I think float32 was an offhand desire to save bytes, but it could be simply incorrect.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Someone on the EE team will better be able to clarify.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think this might be a good idea. If people want float32, they can cast before they write things out.

@alxmrs
Copy link
Collaborator

alxmrs commented Feb 16, 2024

namely that bounds change ever so slightly with the ee.geometry interface before and after getInfo() call. Maybe we can support a tuple[float, float, float, float] type in the geometry as a compromise?

I think this may exist already, or is close to existing. You don’t have to pass in ee.Geometry to set the geometry. Does proj’s CRS object provide one the way?

We could also drop the double precision arg and enable behind the scenes if the geometry or scale is float64 somehow?

I like this idea! It makes sense that it would be related to the projection.

@ljstrnadiii
Copy link
Contributor Author

ljstrnadiii commented Feb 16, 2024

I think this may exist already, or is close to existing. You don’t have to pass in ee.Geometry to set the geometry. Does proj’s CRS object provide one the way?

Sorry, what may exist already or close to existing? Support for tuple[float, float, float, float] type in geometry? As far as I understand, you can pass geometry or None. If None, the bounds are taken from the CRS's area of use. This object doesn't have a set method for area_of_use as far as I can tell, which makes sense 🤷 .

I like this idea! It makes sense that it would be related to the projection.

Me, too, but I am not so sure there is a straightforward way to determine if the bounds provided are float32 or float64.

I am starting to think that the safest/simplest thing would be to change to float64 for coords to avoid truncation, and add support for a tuple[float, float, float, float] type in geometry. That would let me precisely define the dataset's transform. Thoughts? I can implement and push up those changes to get a sense of what that would look like if there are no major objections.

@alxmrs
Copy link
Collaborator

alxmrs commented Feb 17, 2024 via email

@ljstrnadiii ljstrnadiii changed the title feat: match_xarray and use_coords_double_precision feat: support bbox in in geometry Feb 17, 2024
@ljstrnadiii
Copy link
Contributor Author

@alxmrs I am not sure how you will feel about keeping geometry only or if you would prefer a bounds or bbox argument. I added the proposed changes. Also, apologies for the messy commit history, but the last three should be helpful in demonstrating things.

@ljstrnadiii ljstrnadiii requested a review from alxmrs February 17, 2024 16:03
xee/ext.py Outdated
@@ -139,7 +139,7 @@ def open(
crs: Optional[str] = None,
scale: Optional[float] = None,
projection: Optional[ee.Projection] = None,
geometry: Optional[ee.Geometry] = None,
geometry: Optional[Union[ee.Geometry, types.types.BBox]] = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it just be just types and not types.types?

xee/ext.py Outdated
x_min_0, y_min_0, x_max_0, y_max_0 = _ee_bounds_to_bounds(
self.get_info['bounds']
)
elif isinstance(geometry, Union[List, Tuple, np.ndarray]):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does typing.Sequence work here?

Consider adding a check for length of the sequence, too.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! You have one :)

xee/ext.py Show resolved Hide resolved
xee_dataset = xr.open_dataset(
ee.ImageCollection(ic),
engine='ee',
geometry=tuple(raster.rio.bounds()),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

ee.ImageCollection(ic),
engine='ee',
geometry=tuple(raster.rio.bounds()),
scale=raster.rio.resolution()[0],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related: should we let users pass in x and y scale?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Is this possible somehow today, eg via proj?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is. Great call! I added this to the integration test in a2b3c55

@ljstrnadiii ljstrnadiii requested a review from alxmrs February 20, 2024 01:53
Copy link
Collaborator

@alxmrs alxmrs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Added a couple of nits, but overall I really like this improvement. I think CI needs to pass for this to be merged.

xee/ext.py Outdated
@@ -139,7 +139,7 @@ def open(
crs: Optional[str] = None,
scale: Optional[float] = None,
projection: Optional[ee.Projection] = None,
geometry: Optional[ee.Geometry] = None,
geometry: Optional[Union[ee.Geometry, types.BBox]] = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider just using a union with None. Optional is like a Union, so this is somewhat redundant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you misread or I am misunderstanding. It sounds like you think I put Optional[Union[ee.Geometry, None]] in which case, yes, agreed--redundant, but I added types.BBox. I actually am not sure if the type types.BBox will display as tuple[float, float, float, float] in jupyter or something and I am somewhat inclined to change to Optional[Union[ee.Geometry, Tuple[float, float, float, float]] = None to make it as clear as possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented in 4c0dd6b

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Union[ee.Geometry, types.BBox, None] is equivalent to Optional[Union[ee.Geometry, types.BBox]] and in my opinion a bit cleaner.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for being unclear. I'm referring to the definition of Optional: https://docs.python.org/3/library/typing.html#typing.Optional

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I was confused, but your preference is clear now! Thanks. Updated in 23ea0c9

xee/ext.py Outdated
x_min_0, y_min_0, x_max_0, y_max_0 = _ee_bounds_to_bounds(
self.get_info['bounds']
)
elif isinstance(geometry, Union[List, Tuple, np.ndarray, Sequence]):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I meant can sequence replace the rest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing to consider is that sequence will let in a simple string e.g. isinstance("hotdog", Sequence) evaluates to True. Still want me to drop all types in favor of Sequence alone? There is a part of me that is also weirded out by even using np.ndarray since that could be an incorrect shape that still evaluates to length four.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would like to be more stringent with types and changes this to only Union[List, Tuple].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented in 4c0dd6b

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking. SGTM.

engine='ee',
geometry=tuple(raster.rio.bounds()),
projection=ee.Projection(
crs=str(raster.rio.crs), transform=raster.rio.transform()[:6]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also nice!

@ljstrnadiii
Copy link
Contributor Author

ljstrnadiii commented Feb 20, 2024

@alxmrs thanks for another review! I am going to simplify types as I mentioned in comments and add a small demo in the readme. As far as CI, I think auth is failing because my user name does have the credentials? I am not sure if this is on my end or yours? I am guessing an author/google employee may need to trigger CI for auth to work correctly?

@alxmrs
Copy link
Collaborator

alxmrs commented Feb 22, 2024

I think @jdbcode or @naschmitz can help with CI. I’m not familiar with how it’s set up.

@ljstrnadiii
Copy link
Contributor Author

@jdbcode or @naschmitz any tips? The tests workflow fails at authentication.

@ljstrnadiii
Copy link
Contributor Author

@jdbcode @naschmitz kindly bumping

@naschmitz
Copy link
Collaborator

Sorry for the delay. We're extremely understaffed right now. I'm going to take a look at your PR now.

Copy link
Collaborator

@naschmitz naschmitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! I'll approve once the changes are made.

xee/ext.py Outdated
geometry: Optional[ee.Geometry] = None,
geometry: Optional[
Union[ee.Geometry, Tuple[float, float, float, float]]
] = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you update the type annotation to the following. Geometry | Tuple | None is preferred over Optional[Geometry | Tuple].

Union[ee.Geometry, Tuple[float, float, float, float], None]

Here and other places in the PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implemented here b640d95

xee/ext.py Outdated
try:
x_min_0, y_min_0, x_max_0, y_max_0 = self.crs.area_of_use.bounds
except AttributeError:
# `area_of_use` is probable `None`. Parse the geometry from the first
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: probably

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implemented here b640d95

xee/ext.py Outdated
x_min_0, y_min_0, x_max_0, y_max_0 = _ee_bounds_to_bounds(
self.get_info['bounds']
)
elif isinstance(geometry, Union[List, Tuple]):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to check against Sequence here instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is fine, but a string is also sequence and this would catch and raise in that case. Updated anyways.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implemented here b640d95

xee/ext.py Outdated
@@ -231,19 +235,31 @@ def __init__(
# Parse the dataset bounds from the native projection (either from the CRS
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider breaking this out into a separate function. This logic is much more complex now.

Looks like the only input is geometry. Return tuple should be a 4-length tuple.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added in b640d95

@ljstrnadiii ljstrnadiii requested a review from naschmitz March 16, 2024 15:23
@ljstrnadiii
Copy link
Contributor Author

@naschmitz thanks for the review and no worries on the timeline. I'm just stoked to get it in as this is basically my most substantial OSS contribution to date after 6 years of startup life.

Tests pass locally btw! Thanks again and let me know if there is anything else you would like to see 🙏

@ljstrnadiii
Copy link
Contributor Author

Any idea what is up with CI test step authentication?

@copybara-service copybara-service bot merged commit 5fe4395 into google:main Mar 21, 2024
6 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants