-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sentinel-2 downloader does not download images newer than August 28, 2024 #334
Comments
I looked into BigQuery after hearing that the csv files were not going to be updated anymore. The queries are billed by the amount of data that is processed. There is a free tier that provides 1TB/month. While this should be enough for many applications, it will not meet the requirements for all users. An example query for 4 tiles over Germany (full archive, no cc limit) required about 6GB of data. |
Yeah, I was afraid this was tied to some billing plan. Billing for metadata, though, is strange. What would happen when the quota is exceeded? Would it just stop working or do the users receive a bill? And do you know how much data a query for the whole of Germany would be? |
Hi all, I must admit that I did not dove to deep into the whole billing issue. Together with @felixlobert we then installed a GCS docker file following this documentation, which can be used to call BigQuery on GCS from your machine. Before you can run it you need to authenticate with you Google account, which is also explained in the documentation. Probably this is not the most handy solution (as the initial set up takes some time), the query can for sure be optimized / generalized (now it contains e.g., a hard coded AOI), plus it might be charged for at some point (until now I did not need to share my credit card details). But for now it seems to be a stable solution that works for us and enables to keep the datacube for Germany up to date. Looking forward to your feedback! |
Hey @geo-masc, you can convert AOI file into WKT string and use that for the query using
In query:
I am working on script to download from CDSE directly. They just announced that Sentinel-2 data now only come with the newest baseline and the old ones will be deleted (Info here). This could save us from filtering the new baseline by ourselves, and I'm not sure how the data sitiuation will be on the GC. |
One possible workaround would of course be rewriting the force-level1-csd --update functionality to just pull the BigQuery table. That way everything downstream stays the same. |
Thanks @vudongpham for this suggestion. I our case this is not necessary, as the AOI (list of MGRS respectively) is defined in L1CSD. We just (spatially) restricted the query to Germany, so that the CSV does not contain metadata for the whole globe. Sounds great that you are already working on a CDSE extension. Looking forward! @ernstste this is what we are currently doing but of course with an additional docker. Would be great to include this in L1CSD -u directly. |
Hi all, thanks for chiming in! @ernstste, it would be really great if you could integrate an approach like @geo-masc and @felixlobert developed. Would you need additional dependencies in the base image? I guess it would also be a good idea to write some sort of warning to stdout regarding the possibility of being billed - or even a mandatory "I know what I am doing" getopt option? @vudongpham, this sounds awesome. Do you have plans on how to release this when being finished? Would you be okay with us integrating that tool into FORCE when time comes? Cheers, |
Quick update. Apparently, BigQuery also finds some additional data with the defined query (e.g., "S2A_OPER_MSI_L1C_TL_EPA__20170507T091532_A002107_T32UMV_N02.04"). Thus, we adjusted the query so that we now only get the relevant data in the exact format as needed for L1CSD. Maybe something to build on @ernstste. |
Hi all, I create a repo for scripts to search and download from CDSE, have a look and give it a try: Docker image is available
I tried to mimic the landsatlinks commands from @ernstste, though might not be as detailed. |
Hi @vudongpham. I just managed to give it a try. That already looks quite promising. The basic functionality seems to be there. Thanks for that! If we want to integrate it into FORCE, we would need to add some functionality, error handling, usage stuff, etc. Let me know if you want to go these steps, then I will set up things on FORCE's end to facilitate the development and communication Cheers, |
Hi @davidfrantz I am happy to do that! Although at the moment I might not be able to work on this actively, probably until Feb next year. Just so you know. Cheers, |
It should be pretty easy to find a pre-existing CDSE downloader written in Python, if that is what you want, so you don't have to write one yourselves. (The hard part with making a downloader that works well is that the CDSE servers intermittently responds with 404 and various 500 responses.) |
The problem
The Sentinel-2 download tool
force-level1-csd
downloads the images from Google Cloud Storage.For this, it first downloads a big csv table that holds all the metadata. The filtering of data is then done locally, which has the big advantage of allowing for very complex AOI vectors, and circumvents the usual paging etc. from usual APIs that are around (OData, e.g.).
Unfortunately, this csv table doesn't get updates anymore (August 28, 2024). Data is still ingested into GCS, but the way to retrieve the metadata has been changed, which renders
force-level1-csd
partially broken (at least for newer data).I opened an issue on Google's tracker: https://issuetracker.google.com/issues/369223578
Potential solutions
Apparently, the solution is to change the query to use a BigQuery table.
That said, there is some urgent need for
a) an alternative, e.g., developing a new downloader for CDSE, or
b) potentially switching to BigQuery
Option a would be quite some effort, and I believe very complex AOI vectors will be difficult to handle. On the other hand, it would be the "official" way of obtaining the data.
Unfortunately, I am not familiar with BigQuery, and how much effort it would be changing to this. I also don't know if there are other downsides to it...
This issue here serves as a discussion on how to proceed, and see what will be the most feasible option - I am also open to other solutions.
I am mentioning some people who I have been in touch with on this topic to include you here: @vudongpham @ernstste @geo-masc
Cheers,
David
A note on the CODE-DE Data Cube
PS: for the German Data Cube on CODE-DE, we switched to an ad-hoc solution of scanning the file system for available new L1 data. That said, the CODE-DE datacube is still up-to-date!
The text was updated successfully, but these errors were encountered: