Support for Eumetsat in weather-dl #160

mahrsee1997 · 2022-06-06T19:33:57Z

No description provided.

uhager · 2022-06-06T20:34:05Z

weather_dl/download_pipeline/clients.py

+                raise TimeoutError(f'Customisation took longer than {timeout}s')
+            self.logger.info('Customisation for product %s: output %s', product, customisation.outputs)
+            with customisation.stream_output(customisation.outputs[0]) as stream:
+                with open(output, 'wb') as fdst:


Have you tested that this works? I initially managed to download some files directly in netcdf using this, but recently it has failed with various errors. I think unless this works reliably, we should just support downloading native.

Yes, I have tested it and this works fine. It seems reliable considering we have handled the library’s limitations. However, I too faced a few issues randomly - might be because the eumdac library is not mature.

Hence, we are thinking in the direction of downloading files in native format (.nat) only as it is more reliable and then process it accordingly for BiqQuery ingestion. WDYT ?
Also do you have an idea on how to open/read native files (.nat) ?

Re: recently it has failed with various errors - The errors reported by eumdac library are incorrect. The message says problems are related to Authentication & Authorisation but actually the problem is occuring due to constraint violation of EITHER max 3 customisation (running+queued) limit OR user workspace exceeding 20GB.

uhager · 2022-06-06T20:35:34Z

weather_dl/download_pipeline/clients.py

+        EUMETSAT Data Tailor API throws "You are exceeding your maximum number 3 of queued+running customisations."
+        error.
+
+        User's personal workspace is restricted to 20 GB. To resolve workspace size exhaust error, please delete


From what I remember this only applies to the web interface?

Yes, as we are using the eumdac library that internally uses Data Tailor Web Service (DTWS).
See: https://gitlab.eumetsat.int/eumetlab/data-services/eumdac/-/blob/public/eumdac/datatailor.py#L102
Hence we might have to bear with the limitations of (1) max 3 customisation (running+queued) and (2) user workspace of 20GB.

We will also explore Data Tailor standalone and other libraries to check the possibility of integrating the same within the current pipeline.

weather_dl/download_pipeline/partition.py

…at-support

alxmrs

Hey Rahul. I've gone through another revision and left a few minor notes. However, let's schedule some time for us to meet to discuss a slightly new approach for this change. I think we may be able to maintain our partition strategy moreso that I originally though, after looking at the underlying API. Let's meet to brainstorm the topic.

alxmrs · 2022-07-12T23:24:26Z

weather_dl/download_pipeline/pipeline.py

+        if client_name == 'eumetsat':
+            num_requesters_per_key = client.num_requests_per_key(
+                config.dataset,
+                bool(config.selection.get('eumetsat_format_conversion_to'))
+            )
+        else:
+            num_requesters_per_key = client.num_requests_per_key(
+                config.dataset
+            )


This shouldn't be necessary. We already delegate calculating the number of requests per key by the download class. Instead of adding an if statement, let's see if we can add this logic without creating a special case for this client. Refactoring the general case is OK.

alxmrs · 2022-07-12T23:26:31Z

weather_dl/download_pipeline/clients.py

+    def __init__(self, config: Config, level: int = logging.INFO,
+                 config_subsection: t.Optional[t.Tuple] = None) -> None:
+        super().__init__(config, level)
+        self.key = config.kwargs.get('api_key', os.environ.get("EUMETSATAPI_KEY"))
+        self.secret = config.kwargs.get('api_secret', os.environ.get("EUMETSATAPI_SECRET"))
+        if not self.key and config_subsection:
+            self.key = config_subsection[1].get('api_key')
+        if not self.secret and config_subsection:
+            self.secret = config_subsection[1].get('api_secret')


Why do we need to pass in a config subsection here? Shouldn't this already be done in the partition step? The standard key is the one from the subsection.

alxmrs · 2022-07-12T23:27:39Z

weather_dl/download_pipeline/clients.py

+
+    def download_custom(self, product: eumdac.product.Product, token: eumdac.AccessToken,
+                        chain_config: eumdac.tailor_models.Chain, output: str) -> None:
+        """Downloads the prduct after customisation."""


Typo: prduct

alxmrs · 2022-07-12T23:28:57Z

weather_dl/download_pipeline/clients.py

+                    shutil.copyfileobj(stream, fdst, DEFAULT_READ_BUFFER_SIZE)
+
+    def retrieve(self, dataset: str, selection: t.Dict, output: str) -> None:
+        selection_ = optimize_selection_partition(selection)


If we only allow partition by ID, then I think this is unnecessary?

alxmrs · 2022-07-12T23:31:09Z

weather_dl/download_pipeline/clients.py

+            with open(output, 'wb') as fdst:
+                shutil.copyfileobj(fsrc, fdst, DEFAULT_READ_BUFFER_SIZE)
+
+    def download_custom(self, product: eumdac.product.Product, token: eumdac.AccessToken,


Let's use the retry decorator util here, too.

alxmrs · 2022-07-12T23:31:15Z

weather_dl/download_pipeline/clients.py

+        credentials = (self.key, self.secret)
+        return eumdac.AccessToken(credentials)
+
+    def download_native(self, product: eumdac.product.Product, output: str) -> None:


Let's use the retry decorator util here.

added support for Eumetsat in weather-dl

40c8f7a

mahrsee1997 requested review from uhager and alxmrs June 6, 2022 19:39

uhager reviewed Jun 6, 2022

View reviewed changes

refactored the code for paralleling the products querying.

1ea9a82

mahrsee1997 requested a review from uhager June 7, 2022 17:32

alxmrs marked this pull request as ready for review July 11, 2022 17:36

Merge branch 'main' of github.com:google/weather-tools into dl-eumets…

c175b18

…at-support

alxmrs requested changes Jul 12, 2022

View reviewed changes

alxmrs mentioned this pull request Aug 6, 2022

Factor weather-tools as a library that users can extend. #206

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Eumetsat in weather-dl #160

Support for Eumetsat in weather-dl #160

mahrsee1997 commented Jun 6, 2022

uhager Jun 6, 2022

mahrsee1997 Jun 7, 2022

uhager Jun 6, 2022

mahrsee1997 Jun 7, 2022

alxmrs left a comment

alxmrs Jul 12, 2022

alxmrs Jul 12, 2022

alxmrs Jul 12, 2022

alxmrs Jul 12, 2022

alxmrs Jul 12, 2022

alxmrs Jul 12, 2022

Support for Eumetsat in weather-dl #160

Are you sure you want to change the base?

Support for Eumetsat in weather-dl #160

Conversation

mahrsee1997 commented Jun 6, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alxmrs left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment