You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, weather-tools exists as a set of CLIs. Adding capabilities in areas that are likely to change (see below) requires code modification. It would be great to users, internal or otherwise, if we could extend each tool's capabilities to suit idiosyncratic user needs in a simple way.
So far, in the development of weather-dl and weather-mv, we've identified areas that are likely to change. These are:
weather-dl
Users should be able to add different weather data providers, like met offices or protocols (e.g. Globus?). We currently abstract this with our Client class.
For each data provider, users should be able to express how that data source can be partitioned. I'll discuss more about the specifics of this in Support for Eumetsat in weather-dl #160, but to summarize: there should be a client-specific way to describe how data requests can be partitioned by config, and thus parallelized.
weather-mv
It seems like every met institution uses a different data format. Even when a standard format is used, each institution has strong opinions about how data is represented in that format. The way raw files are opened and processed is very specific to the source data and hard to make truly generic. Not to mention, institutions may have needs to keep their data sources private (often, due to licensing agreements), and thus, are not able to open source preprocessing code that they otherwise would like to contribute.
Destinations for where weather data ends up, while slower moving, may need to be flexible for change. Right now, we can load data to BigQuery, Earth Engine, and we regrid data with MetView. Sometime later, users may want to load data to Postgres, elasticsearch, or some other geospatial datastore.
Towards these ends, I propose the following: Let's internally represent each weather tool as a library that's friendly to import. We allow users to extend areas that we expect will change frequently, and otherwise provide a common interface / framework for the rest of the pipeline.
To speak more concretely, here's a rough sketch of what this would look like for the weather mover:
import weather_mv
class MyCustomPipeline(weather_mv.Pipeline)
def add_args(parser) -> None:
# add custom args
pass
def open_datasets(uri: str) -> t.Iterator[xr.Dataset]:
# custom logic to open and process a URI to an XArray dataset
pass
if __name__ == '__main__':
# Allow users to pass a setup.py so they can specify custom dependencies.
# Maybe later, they could also pass in a custom Anaconda environment.yml...
MyCustomPipeline(f'{os.getcwd()/setup.py').run()
# # Users could also choose to run the pipeline and return immediately,
# # instead of waiting for the remote pipeline to finish.
# MyCustomPipeline(f'{os.getcwd()/setup.py').run_async()
Later, I'll provide some sketches for how we could allow users to extend ToDataSink or Client with similar affordances.
A key benefit of this new direction is that it fits in really nicely with our pip-based distribution. Users of custom pipelines just need to add google-weather-tools to their setup.py (along with their other dependencies) to set themselves up for a specific solution.
Thanks to @deepgabani8 and jayendrap[at]google.com for the formation of this idea.
The text was updated successfully, but these errors were encountered:
Currently, weather-tools exists as a set of CLIs. Adding capabilities in areas that are likely to change (see below) requires code modification. It would be great to users, internal or otherwise, if we could extend each tool's capabilities to suit idiosyncratic user needs in a simple way.
So far, in the development of
weather-dl
andweather-mv
, we've identified areas that are likely to change. These are:weather-dl
Client
class.weather-mv
Towards these ends, I propose the following: Let's internally represent each weather tool as a library that's friendly to import. We allow users to extend areas that we expect will change frequently, and otherwise provide a common interface / framework for the rest of the pipeline.
To speak more concretely, here's a rough sketch of what this would look like for the weather mover:
Later, I'll provide some sketches for how we could allow users to extend
ToDataSink
orClient
with similar affordances.A key benefit of this new direction is that it fits in really nicely with our
pip
-based distribution. Users of custom pipelines just need to addgoogle-weather-tools
to theirsetup.py
(along with their other dependencies) to set themselves up for a specific solution.Thanks to @deepgabani8 and jayendrap[at]google.com for the formation of this idea.
The text was updated successfully, but these errors were encountered: