Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand Variables class to read s3 urls #464

Merged
merged 79 commits into from
Nov 27, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
79 commits
Select commit Hold shift + click to select a range
9d09ff9
mvp remove intake from Read
rwegener2 Aug 1, 2023
e5458a1
Merge branch 'development' into refactor_intake
rwegener2 Aug 29, 2023
24f6a42
delete is2cat and references
rwegener2 Aug 29, 2023
b13b847
remove extra comments
rwegener2 Aug 30, 2023
0779b80
update doc strings
rwegener2 Aug 30, 2023
1cfbf72
update tests
rwegener2 Aug 30, 2023
de61d87
update documentation for removing intake
rwegener2 Aug 30, 2023
9f06611
update approach paragraph
rwegener2 Aug 30, 2023
d019b9a
remove one more instance of catalog from the docs
rwegener2 Aug 30, 2023
156ea89
clear jupyter history
rwegener2 Aug 30, 2023
b26ca4e
Update icepyx/core/read.py
rwegener2 Sep 1, 2023
ce1ca76
remove intake and related modules
rwegener2 Sep 1, 2023
fd00aeb
Merge branch 'development' into read_arguments
rwegener2 Sep 4, 2023
431af78
mvp with new read parameters
rwegener2 Sep 5, 2023
612662e
clean up remainder of file and remove extraneous comments
rwegener2 Sep 5, 2023
c16a003
maintain backward compatibility and combine arguments
rwegener2 Sep 5, 2023
7648078
update to new error message
rwegener2 Sep 5, 2023
4cfbfdb
update docs
rwegener2 Sep 8, 2023
f7f823b
glob kwargs and list error
rwegener2 Sep 8, 2023
203f3ad
formatting updates
rwegener2 Sep 8, 2023
10d1591
Apply suggestions from code review
rwegener2 Sep 12, 2023
0b23d1e
remove num_files
rwegener2 Sep 12, 2023
6f5bead
fix docs test typo
rwegener2 Sep 12, 2023
035ee5a
trying again to fix the build
rwegener2 Sep 12, 2023
903c351
add feedback to docs page
rwegener2 Sep 12, 2023
d842bde
Merge branch 'development' into read_arguments
rwegener2 Sep 13, 2023
5e06de9
fix typo
rwegener2 Sep 14, 2023
9ca29f1
Merge branch 'development' into read_arguments
rwegener2 Sep 14, 2023
e8e35ad
Merge branch 'development' into read_arguments
rwegener2 Sep 18, 2023
e3566f8
mvp for making a standalone variables class
rwegener2 Sep 18, 2023
1d53341
update QUEST and GenQuery classes for argo integration (#441)
JessicaS11 Sep 25, 2023
44fd8cc
clean comments
rwegener2 Oct 3, 2023
69dce54
split data_source into seperate arguments
rwegener2 Oct 16, 2023
72e1e37
clean dev notes
rwegener2 Oct 16, 2023
83d24fb
update docstrings
rwegener2 Oct 16, 2023
a187328
little fixes
rwegener2 Oct 17, 2023
dce23f9
upgrade Variables to an stand alone import
rwegener2 Oct 17, 2023
3561be8
update example notebooks
rwegener2 Oct 17, 2023
d13ac33
hide get_latest_version
rwegener2 Oct 17, 2023
593b9d1
update api docs
rwegener2 Oct 17, 2023
d03f9fb
temporarily disable OpenAltimetry API tests (#459)
JessicaS11 Oct 18, 2023
ee8b79f
fix spot number calculation (#458)
JessicaS11 Oct 18, 2023
a1a723d
Fix a broken link in IS2_data_access.ipynb (#456)
whyjz Oct 18, 2023
d86cc9e
update Read input arguments (#444)
rwegener2 Oct 18, 2023
aedbcce
enable QUEST kwarg handling (#452)
JessicaS11 Oct 19, 2023
652a815
remove variables from components section
rwegener2 Oct 20, 2023
120694a
fix error dropping components
rwegener2 Oct 20, 2023
b4d59d6
move latest_version to is2ref
rwegener2 Oct 20, 2023
3f4bfa7
current status
rwegener2 Oct 23, 2023
a29e756
mvp updated read class
rwegener2 Oct 23, 2023
3f3cb1f
Merge branch 'development' into indep_vars
rwegener2 Oct 23, 2023
4f8e95a
add error message if no vars.wanted
rwegener2 Oct 26, 2023
73f929e
docs: add rwegener2 as a contributor for bug, code, and 6 more (#460)
allcontributors[bot] Oct 26, 2023
a56a9c8
docs: add jpswinski as a contributor for review (#461)
allcontributors[bot] Oct 26, 2023
d0838c8
update query class to append required vars
rwegener2 Oct 26, 2023
8127dbf
remove local filepaths
rwegener2 Oct 27, 2023
b3341c1
clean extraneous comments
rwegener2 Oct 27, 2023
a184d9c
Merge branch 'development' into indep_vars
rwegener2 Oct 27, 2023
bdcc9bd
docs: add whyjz as a contributor for tutorial (#462)
allcontributors[bot] Oct 27, 2023
8ff1e70
Update icepyx/core/query.py
rwegener2 Oct 31, 2023
3217565
remove redundant lines
rwegener2 Oct 31, 2023
c0e5f4e
Merge branch 'development' into indep_vars
rwegener2 Oct 31, 2023
defc76f
respond to review
rwegener2 Oct 31, 2023
5de9173
Merge branch 'indep_vars' of https://github.com/icesat2py/icepyx into…
rwegener2 Oct 31, 2023
4bf2ba8
add forgotten docstring from previous PR
rwegener2 Oct 31, 2023
11625ec
allow Variables to read s3urls
rwegener2 Oct 31, 2023
fb90b0c
add newest icepyx citations (#455)
JessicaS11 Nov 2, 2023
0f18d56
add warning if user is accessing data outside NSIDC bucket
rwegener2 Nov 2, 2023
d5747fa
Variables as an independent class (#451)
rwegener2 Nov 7, 2023
2e84bbc
resolve merge conflicts
rwegener2 Nov 7, 2023
1e0bc69
make warning clearer
rwegener2 Nov 7, 2023
150d763
move function to trigger travis run
rwegener2 Nov 9, 2023
ddd2540
Merge branch 'development' into s3_vars
JessicaS11 Nov 20, 2023
d44dca0
move extract product and version functions back to end of file
JessicaS11 Nov 20, 2023
0837bd7
remove duplicate imports and some comments
JessicaS11 Nov 20, 2023
c77de88
accidentally removed numpy...
JessicaS11 Nov 20, 2023
454e30b
fix auth requirement for local files
rwegener2 Nov 21, 2023
f84bb32
Merge branch 'development' into s3_vars
JessicaS11 Nov 27, 2023
ef7aa1c
add cloud notes to variables notebook
JessicaS11 Nov 27, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 8 additions & 5 deletions doc/source/example_notebooks/IS2_data_variables.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
"\n",
"A given ICESat-2 product may have over 200 variable + path combinations.\n",
"icepyx includes a custom `Variables` module that is \"aware\" of the ATLAS sensor and how the ICESat-2 data products are stored.\n",
"The module can be accessed independently, and can also be accessed as a component of a `Query` object or `Read` object.\n",
"The module can be accessed independently and can also be accessed as a component of a `Query` object or `Read` object.\n",
"\n",
"This notebook illustrates in detail how the `Variables` module behaves. We use the module independently and also show how powerful it is directly in the icepyx workflow using a `Query` data access example.\n",
"Module usage using `Query` is analogous through an icepyx ICESat-2 `Read` object.\n",
Expand Down Expand Up @@ -75,7 +75,7 @@
"There are three ways to create or access an ICESat-2 Variables object in icepyx:\n",
"1. Access via the `.order_vars` property of a Query object\n",
"2. Access via the `.vars` property of a Read object\n",
"3. Create a stand-alone ICESat-2 Variables object using a local file or a product name\n",
"3. Create a stand-alone ICESat-2 Variables object using a local file, cloud file, or a product name\n",
"\n",
"An example of each of these is shown below."
]
Expand Down Expand Up @@ -180,8 +180,11 @@
"### 3. Create a stand-alone Variables object\n",
"\n",
"You can also generate an independent Variables object. This can be done using either:\n",
"1. The filepath to a file you'd like a variables list for\n",
"2. The product name (and optionally version) of a an ICESat-2 product"
"1. The filepath to a local or cloud file you'd like a variables list for\n",
"2. The product name (and optionally version) of a an ICESat-2 product\n",
"\n",
"*Note: Cloud data access requires a valid Earthdata login; \n",
"you will be prompted to log in if you are not already authenticated.*"
]
},
{
Expand Down Expand Up @@ -255,7 +258,7 @@
},
"outputs": [],
"source": [
"v = ipx.Variables(product='ATL03', version='004')"
"v = ipx.Variables(product='ATL03', version='006')"
]
},
{
Expand Down
110 changes: 78 additions & 32 deletions icepyx/core/is2ref.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,10 @@
import warnings
from xml.etree import ElementTree as ET

import earthaccess

import icepyx

# ICESat-2 specific reference functions
# options to get customization options for ICESat-2 data (though could be used generally)


def _validate_product(product):
Expand Down Expand Up @@ -48,9 +47,6 @@ def _validate_product(product):
return product


# DevGoal: See if there's a way to dynamically get this list so it's automatically updated


def _validate_OA_product(product):
"""
Confirm a valid ICESat-2 product was specified
Expand Down Expand Up @@ -87,6 +83,7 @@ def about_product(prod):


# DevGoal: use a mock of this output to test later functions, such as displaying options and widgets, etc.
# options to get customization options for ICESat-2 data (though could be used generally)
def _get_custom_options(session, product, version):
"""
Get lists of what customization options are available for the product from NSIDC.
Expand Down Expand Up @@ -330,6 +327,7 @@ def gt2spot(gt, sc_orient):

return np.uint8(spot)


def latest_version(product):
"""
Determine the most recent version available for the given product.
Expand All @@ -340,38 +338,86 @@ def latest_version(product):
'006'
"""
_about_product = about_product(product)
return max(
[entry["version_id"] for entry in _about_product["feed"]["entry"]]
)
return max([entry["version_id"] for entry in _about_product["feed"]["entry"]])

def extract_product(filepath):

def extract_product(filepath, auth=None):
"""
Read the product type from the metadata of the file. Return the product as a string.
Read the product type from the metadata of the file. Valid for local or s3 files, but must
provide an auth object if reading from s3. Return the product as a string.

Parameters
----------
filepath: string
local or remote location of a file. Could be a local string or an s3 filepath
auth: earthaccess.auth.Auth, default None
An earthaccess authentication object. Optional, but necessary if accessing data in an
s3 bucket.
"""
with h5py.File(filepath, 'r') as f:
try:
product = f.attrs['short_name']
if isinstance(product, bytes):
# For most products the short name is stored in a bytes string
product = product.decode()
elif isinstance(product, np.ndarray):
# ATL14 saves the short_name as an array ['ATL14']
product = product[0]
product = _validate_product(product)
except KeyError:
raise 'Unable to parse the product name from file metadata'
# Generate a file reader object relevant for the file location
if filepath.startswith("s3"):
if not auth:
raise AttributeError(
"Must provide credentials to `auth` if accessing s3 data"
)
# Read the s3 file
s3 = earthaccess.get_s3fs_session(daac="NSIDC", provider=auth)
f = h5py.File(s3.open(filepath, "rb"))
else:
# Otherwise assume a local filepath. Read with h5py.
f = h5py.File(filepath, "r")

# Extract the product information
try:
product = f.attrs["short_name"]
if isinstance(product, bytes):
# For most products the short name is stored in a bytes string
product = product.decode()
elif isinstance(product, np.ndarray):
# ATL14 saves the short_name as an array ['ATL14']
product = product[0]
product = _validate_product(product)
except KeyError:
raise "Unable to parse the product name from file metadata"
# Close the file reader
f.close()
return product

def extract_version(filepath):

def extract_version(filepath, auth=None):
"""
Read the version from the metadata of the file. Return the version as a string.
Read the version from the metadata of the file. Valid for local or s3 files, but must
provide an auth object if reading from s3. Return the version as a string.

Parameters
----------
filepath: string
local or remote location of a file. Could be a local string or an s3 filepath
auth: earthaccess.auth.Auth, default None
An earthaccess authentication object. Optional, but necessary if accessing data in an
s3 bucket.
"""
with h5py.File(filepath, 'r') as f:
try:
version = f['METADATA']['DatasetIdentification'].attrs['VersionID']
if isinstance(version, np.ndarray):
# ATL14 stores the version as an array ['00x']
version = version[0]
except KeyError:
raise 'Unable to parse the version from file metadata'
# Generate a file reader object relevant for the file location
if filepath.startswith("s3"):
if not auth:
raise AttributeError(
"Must provide credentials to `auth` if accessing s3 data"
)
# Read the s3 file
s3 = earthaccess.get_s3fs_session(daac="NSIDC", provider=auth)
f = h5py.File(s3.open(filepath, "rb"))
else:
# Otherwise assume a local filepath. Read with h5py.
f = h5py.File(filepath, "r")

# Read the version information
try:
version = f["METADATA"]["DatasetIdentification"].attrs["VersionID"]
if isinstance(version, np.ndarray):
# ATL14 stores the version as an array ['00x']
version = version[0]
except KeyError:
raise "Unable to parse the version from file metadata"
# Close the file reader
f.close()
return version
4 changes: 4 additions & 0 deletions icepyx/core/query.py
Original file line number Diff line number Diff line change
Expand Up @@ -350,6 +350,10 @@ class Query(GenQuery, EarthdataAuthMixin):
reference ground tracks are used. Example: "0594"
files : string, default None
A placeholder for future development. Not used for any purposes yet.
auth : earthaccess.auth.Auth, default None
An earthaccess authentication object. Available as an argument so an existing
earthaccess.auth.Auth object can be used for authentication. If not given, a new auth
object will be created whenever authentication is needed.

Returns
-------
Expand Down
13 changes: 13 additions & 0 deletions icepyx/core/validate_inputs.py
Original file line number Diff line number Diff line change
Expand Up @@ -104,3 +104,16 @@ def tracks(track):
warnings.warn("Listed Reference Ground Track is not available")

return track_list

def check_s3bucket(path):
"""
Check if the given path is an s3 path. Raise a warning if the data being referenced is not
in the NSIDC bucket
"""
split_path = path.split('/')
if split_path[0] == 's3:' and split_path[2] != 'nsidc-cumulus-prod-protected':
warnings.warn(
's3 data being read from outside the NSIDC data bucket. Icepyx can '
'read this data, but available data lists may not be accurate.', stacklevel=2
)
return path
JessicaS11 marked this conversation as resolved.
Show resolved Hide resolved
33 changes: 19 additions & 14 deletions icepyx/core/variables.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,11 +48,10 @@ class Variables(EarthdataAuthMixin):
Dictionary (key:values) of available variable names (keys) and paths (values).
wanted : dictionary, default None
As avail, but for the desired list of variables
session : requests.session object
A session object authenticating the user to download data using their Earthdata login information.
The session object will automatically be passed from the query object if you
have successfully logged in there.

auth : earthaccess.auth.Auth, default None
An earthaccess authentication object. Available as an argument so an existing
earthaccess.auth.Auth object can be used for authentication. If not given, a new auth
object will be created whenever authentication is needed.
"""

def __init__(
Expand All @@ -75,27 +74,33 @@ def __init__(

if path and product:
raise TypeError(
'Please provide either a filepath or a product. If a filepath is provided ',
'Please provide either a path or a product. If a path is provided ',
'variables will be read from the file. If a product is provided all available ',
'variables for that product will be returned.'
)

# initialize authentication properties
EarthdataAuthMixin.__init__(self, auth=auth)

# Set the product and version from either the input args or the file
if path:
self._path = path
self._product = is2ref.extract_product(self._path)
self._version = is2ref.extract_version(self._path)
self._path = val.check_s3bucket(path)
# Set up auth
if self._path.startswith('s3'):
auth = self.auth
else:
auth = None
# Read the product and version from the file
self._product = is2ref.extract_product(self._path, auth=auth)
self._version = is2ref.extract_version(self._path, auth=auth)
elif product:
# Check for valid product string
self._product = is2ref._validate_product(product)
# Check for valid version string
# If version is not specified by the user assume the most recent version
self._version = val.prod_version(is2ref.latest_version(self._product), version)
else:
raise TypeError('Either a filepath or a product need to be given as input arguments.')

# initialize authentication properties
EarthdataAuthMixin.__init__(self, auth=auth)
raise TypeError('Either a path or a product need to be given as input arguments.')

self._avail = avail
self.wanted = wanted
Expand Down Expand Up @@ -138,7 +143,7 @@ def avail(self, options=False, internal=False):
"""

if not hasattr(self, "_avail") or self._avail == None:
if not hasattr(self, 'path'):
if not hasattr(self, 'path') or self.path.startswith('s3'):
self._avail = is2ref._get_custom_options(
self.session, self.product, self.version
)["variables"]
Expand Down