Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Factor weather-tools as a library that users can extend. #206

Open
alxmrs opened this issue Aug 6, 2022 · 0 comments · May be fixed by #237
Open

Factor weather-tools as a library that users can extend. #206

alxmrs opened this issue Aug 6, 2022 · 0 comments · May be fixed by #237

Comments

@alxmrs
Copy link
Collaborator

alxmrs commented Aug 6, 2022

Currently, weather-tools exists as a set of CLIs. Adding capabilities in areas that are likely to change (see below) requires code modification. It would be great to users, internal or otherwise, if we could extend each tool's capabilities to suit idiosyncratic user needs in a simple way.

So far, in the development of weather-dl and weather-mv, we've identified areas that are likely to change. These are:

weather-dl

  • Users should be able to add different weather data providers, like met offices or protocols (e.g. Globus?). We currently abstract this with our Client class.
  • For each data provider, users should be able to express how that data source can be partitioned. I'll discuss more about the specifics of this in Support for Eumetsat in weather-dl #160, but to summarize: there should be a client-specific way to describe how data requests can be partitioned by config, and thus parallelized.

weather-mv

  • It seems like every met institution uses a different data format. Even when a standard format is used, each institution has strong opinions about how data is represented in that format. The way raw files are opened and processed is very specific to the source data and hard to make truly generic. Not to mention, institutions may have needs to keep their data sources private (often, due to licensing agreements), and thus, are not able to open source preprocessing code that they otherwise would like to contribute.
  • Destinations for where weather data ends up, while slower moving, may need to be flexible for change. Right now, we can load data to BigQuery, Earth Engine, and we regrid data with MetView. Sometime later, users may want to load data to Postgres, elasticsearch, or some other geospatial datastore.

Towards these ends, I propose the following: Let's internally represent each weather tool as a library that's friendly to import. We allow users to extend areas that we expect will change frequently, and otherwise provide a common interface / framework for the rest of the pipeline.

To speak more concretely, here's a rough sketch of what this would look like for the weather mover:

import weather_mv

class MyCustomPipeline(weather_mv.Pipeline)
    def add_args(parser) -> None:
        # add custom args
        pass

    def open_datasets(uri: str) -> t.Iterator[xr.Dataset]:
        # custom logic to open and process a URI to an XArray dataset
        pass

if __name__ == '__main__':
   # Allow users to pass a setup.py so they can specify custom dependencies.
   # Maybe later, they could also pass in a custom Anaconda environment.yml...
   MyCustomPipeline(f'{os.getcwd()/setup.py').run()

   # # Users could also choose to run the pipeline and return immediately,
   # # instead of waiting for the remote pipeline to finish.
   # MyCustomPipeline(f'{os.getcwd()/setup.py').run_async()

Later, I'll provide some sketches for how we could allow users to extend ToDataSink or Client with similar affordances.

A key benefit of this new direction is that it fits in really nicely with our pip-based distribution. Users of custom pipelines just need to add google-weather-tools to their setup.py (along with their other dependencies) to set themselves up for a specific solution.

Thanks to @deepgabani8 and jayendrap[at]google.com for the formation of this idea.

@deepgabani8 deepgabani8 linked a pull request Sep 21, 2022 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant