Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continue to use nanosecond-precision Timestamps in precision-sensitive areas #7731

Merged
merged 12 commits into from
Apr 13, 2023
Merged
2 changes: 1 addition & 1 deletion ci/requirements/all-but-dask.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ dependencies:
- numbagg
- numpy<1.24
- packaging
- pandas<2
- pandas
- pint
- pip
- pseudonetcdf
Expand Down
2 changes: 1 addition & 1 deletion ci/requirements/doc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ dependencies:
- numba
- numpy>=1.21,<1.24
- packaging>=21.3
- pandas>=1.4,<2
- pandas>=1.4
- pooch
- pip
- pre-commit
Expand Down
2 changes: 1 addition & 1 deletion ci/requirements/environment-py311.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ dependencies:
- numexpr
- numpy
- packaging
- pandas<2
- pandas
- pint
- pip
- pooch
Expand Down
2 changes: 1 addition & 1 deletion ci/requirements/environment-windows-py311.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ dependencies:
# - numbagg
- numpy
- packaging
- pandas<2
- pandas
- pint
- pip
- pre-commit
Expand Down
2 changes: 1 addition & 1 deletion ci/requirements/environment-windows.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ dependencies:
- numbagg
- numpy<1.24
- packaging
- pandas<2
- pandas
- pint
- pip
- pre-commit
Expand Down
2 changes: 1 addition & 1 deletion ci/requirements/environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ dependencies:
- numexpr
- numpy<1.24
- packaging
- pandas<2
- pandas
- pint
- pip
- pooch
Expand Down
6 changes: 0 additions & 6 deletions doc/api-hidden.rst
Original file line number Diff line number Diff line change
Expand Up @@ -375,10 +375,8 @@
CFTimeIndex.is_floating
CFTimeIndex.is_integer
CFTimeIndex.is_interval
CFTimeIndex.is_mixed
CFTimeIndex.is_numeric
CFTimeIndex.is_object
CFTimeIndex.is_type_compatible
CFTimeIndex.isin
CFTimeIndex.isna
CFTimeIndex.isnull
Expand All @@ -399,7 +397,6 @@
CFTimeIndex.round
CFTimeIndex.searchsorted
CFTimeIndex.set_names
CFTimeIndex.set_value
CFTimeIndex.shift
CFTimeIndex.slice_indexer
CFTimeIndex.slice_locs
Expand All @@ -413,7 +410,6 @@
CFTimeIndex.to_flat_index
CFTimeIndex.to_frame
CFTimeIndex.to_list
CFTimeIndex.to_native_types
CFTimeIndex.to_numpy
CFTimeIndex.to_series
CFTimeIndex.tolist
Expand All @@ -438,8 +434,6 @@
CFTimeIndex.hasnans
CFTimeIndex.hour
CFTimeIndex.inferred_type
CFTimeIndex.is_all_dates
CFTimeIndex.is_monotonic
CFTimeIndex.is_monotonic_increasing
CFTimeIndex.is_monotonic_decreasing
CFTimeIndex.is_unique
Expand Down
16 changes: 11 additions & 5 deletions doc/user-guide/weather-climate.rst
Original file line number Diff line number Diff line change
Expand Up @@ -57,14 +57,14 @@ CF-compliant coordinate variables

.. _CFTimeIndex:

Non-standard calendars and dates outside the Timestamp-valid range
------------------------------------------------------------------
Non-standard calendars and dates outside the nanosecond-precision range
-----------------------------------------------------------------------

Through the standalone ``cftime`` library and a custom subclass of
:py:class:`pandas.Index`, xarray supports a subset of the indexing
functionality enabled through the standard :py:class:`pandas.DatetimeIndex` for
dates from non-standard calendars commonly used in climate science or dates
using a standard calendar, but outside the `Timestamp-valid range`_
using a standard calendar, but outside the `nanosecond-precision range`_
(approximately between years 1678 and 2262).

.. note::
Expand All @@ -75,13 +75,19 @@ using a standard calendar, but outside the `Timestamp-valid range`_
any of the following are true:

- The dates are from a non-standard calendar
- Any dates are outside the Timestamp-valid range.
- Any dates are outside the nanosecond-precision range.

Otherwise pandas-compatible dates from a standard calendar will be
represented with the ``np.datetime64[ns]`` data type, enabling the use of a
:py:class:`pandas.DatetimeIndex` or arrays with dtype ``np.datetime64[ns]``
and their full set of associated features.

As of pandas version 2.0.0, pandas supports non-nanosecond precision datetime
values. For the time being, xarray still automatically casts datetime values
to nanosecond-precision for backwards compatibility with older pandas
versions; however, this is something we would like to relax going forward.
See :issue:`7493` for more discussion.

For example, you can create a DataArray indexed by a time
coordinate with dates from a no-leap calendar and a
:py:class:`~xarray.CFTimeIndex` will automatically be used:
Expand Down Expand Up @@ -235,6 +241,6 @@ For data indexed by a :py:class:`~xarray.CFTimeIndex` xarray currently supports:

da.resample(time="81T", closed="right", label="right", offset="3T").mean()

.. _Timestamp-valid range: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timestamp-limitations
.. _nanosecond-precision range: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timestamp-limitations
.. _ISO 8601 standard: https://en.wikipedia.org/wiki/ISO_8601
.. _partial datetime string indexing: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#partial-string-indexing
6 changes: 6 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,12 @@ Internal Changes
- Added a config.yml file with messages for the welcome bot when a Github user creates their first ever issue or pull request or has their first PR merged. (:issue:`7685`, :pull:`7685`)
By `Nishtha P <https://github.com/nishthap981>`_.

- Ensure that only nanosecond-precision :py:class:`pd.Timestamp` objects
continue to be used internally under pandas version 2.0.0. This is mainly to
ease the transition to this latest version of pandas. It should be relaxed
when addressing :issue:`7493`. By `Spencer Clark
<https://github.com/spencerkclark>`_ (:issue:`7707`, :pull:`7731`).

.. _whats-new.2023.03.0:

v2023.03.0 (March 22, 2023)
Expand Down
2 changes: 1 addition & 1 deletion setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ include_package_data = True
python_requires = >=3.9
install_requires =
numpy >= 1.21 # recommended to use >= 1.22 for full quantile method support
pandas >= 1.4, <2
pandas >= 1.4
packaging >= 21.3

[options.extras_require]
Expand Down
13 changes: 10 additions & 3 deletions xarray/coding/cftime_offsets.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,12 @@
format_cftime_datetime,
)
from xarray.core.common import _contains_datetime_like_objects, is_np_datetime_like
from xarray.core.pdcompat import NoDefault, count_not_none, no_default
from xarray.core.pdcompat import (
NoDefault,
count_not_none,
nanosecond_precision_timestamp,
no_default,
)
from xarray.core.utils import emit_user_level_warning

try:
Expand Down Expand Up @@ -1286,8 +1291,10 @@ def date_range_like(source, calendar, use_cftime=None):
if is_np_datetime_like(source.dtype):
# We want to use datetime fields (datetime64 object don't have them)
source_calendar = "standard"
source_start = pd.Timestamp(source_start)
source_end = pd.Timestamp(source_end)
# TODO: the strict enforcement of nanosecond precision Timestamps can be
# relaxed when addressing GitHub issue #7493.
source_start = nanosecond_precision_timestamp(source_start)
source_end = nanosecond_precision_timestamp(source_end)
else:
if isinstance(source, CFTimeIndex):
source_calendar = source.calendar
Expand Down
2 changes: 1 addition & 1 deletion xarray/coding/cftimeindex.py
Original file line number Diff line number Diff line change
Expand Up @@ -613,7 +613,7 @@ def to_datetimeindex(self, unsafe=False):
------
ValueError
If the CFTimeIndex contains dates that are not possible in the
standard calendar or outside the pandas.Timestamp-valid range.
standard calendar or outside the nanosecond-precision range.

Warns
-----
Expand Down
22 changes: 18 additions & 4 deletions xarray/coding/times.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
from xarray.core import indexing
from xarray.core.common import contains_cftime_datetimes, is_np_datetime_like
from xarray.core.formatting import first_n_items, format_timestamp, last_item
from xarray.core.pdcompat import nanosecond_precision_timestamp
from xarray.core.pycompat import is_duck_dask_array
from xarray.core.variable import Variable

Expand Down Expand Up @@ -224,7 +225,9 @@ def _decode_datetime_with_pandas(
delta, ref_date = _unpack_netcdf_time_units(units)
delta = _netcdf_to_numpy_timeunit(delta)
try:
ref_date = pd.Timestamp(ref_date)
# TODO: the strict enforcement of nanosecond precision Timestamps can be
# relaxed when addressing GitHub issue #7493.
ref_date = nanosecond_precision_timestamp(ref_date)
except ValueError:
# ValueError is raised by pd.Timestamp for non-ISO timestamp
# strings, in which case we fall back to using cftime
Expand Down Expand Up @@ -391,7 +394,9 @@ def infer_datetime_units(dates) -> str:
dates = to_datetime_unboxed(dates)
dates = dates[pd.notnull(dates)]
reference_date = dates[0] if len(dates) > 0 else "1970-01-01"
reference_date = pd.Timestamp(reference_date)
# TODO: the strict enforcement of nanosecond precision Timestamps can be
# relaxed when addressing GitHub issue #7493.
reference_date = nanosecond_precision_timestamp(reference_date)
else:
reference_date = dates[0] if len(dates) > 0 else "1970-01-01"
reference_date = format_cftime_datetime(reference_date)
Expand Down Expand Up @@ -432,14 +437,16 @@ def cftime_to_nptime(times, raise_on_invalid: bool = True) -> np.ndarray:
If raise_on_invalid is True (default), invalid dates trigger a ValueError.
Otherwise, the invalid element is replaced by np.NaT."""
times = np.asarray(times)
# TODO: the strict enforcement of nanosecond precision datetime values can
# be relaxed when addressing GitHub issue #7493.
new = np.empty(times.shape, dtype="M8[ns]")
for i, t in np.ndenumerate(times):
try:
# Use pandas.Timestamp in place of datetime.datetime, because
# NumPy casts it safely it np.datetime64[ns] for dates outside
# 1678 to 2262 (this is not currently the case for
# datetime.datetime).
dt = pd.Timestamp(
dt = nanosecond_precision_timestamp(
t.year, t.month, t.day, t.hour, t.minute, t.second, t.microsecond
)
except ValueError as e:
Expand Down Expand Up @@ -498,6 +505,10 @@ def convert_time_or_go_back(date, date_type):

This is meant to convert end-of-month dates into a new calendar.
"""
# TODO: the strict enforcement of nanosecond precision Timestamps can be
# relaxed when addressing GitHub issue #7493.
if date_type == pd.Timestamp:
date_type = nanosecond_precision_timestamp
try:
return date_type(
date.year,
Expand Down Expand Up @@ -641,7 +652,10 @@ def encode_cf_datetime(

delta_units = _netcdf_to_numpy_timeunit(delta)
time_delta = np.timedelta64(1, delta_units).astype("timedelta64[ns]")
ref_date = pd.Timestamp(_ref_date)

# TODO: the strict enforcement of nanosecond precision Timestamps can be
# relaxed when addressing GitHub issue #7493.
ref_date = nanosecond_precision_timestamp(_ref_date)

# If the ref_date Timestamp is timezone-aware, convert to UTC and
# make it timezone-naive (GH 2649).
Expand Down
13 changes: 13 additions & 0 deletions xarray/core/pdcompat.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@
from typing import Literal

import pandas as pd
from packaging.version import Version

from xarray.coding import cftime_offsets

Expand Down Expand Up @@ -91,3 +92,15 @@ def _convert_base_to_offset(base, freq, index):
return base * freq.as_timedelta() // freq.n
else:
raise ValueError("Can only resample using a DatetimeIndex or CFTimeIndex.")


def nanosecond_precision_timestamp(*args, **kwargs) -> pd.Timestamp:
"""Return a nanosecond-precision Timestamp object.

Note this function should no longer be needed after addressing GitHub issue
#7493.
"""
if Version(pd.__version__) >= Version("2.0.0"):
return pd.Timestamp(*args, **kwargs).as_unit("ns")
else:
return pd.Timestamp(*args, **kwargs)
1 change: 1 addition & 0 deletions xarray/tests/test_cftime_offsets.py
Original file line number Diff line number Diff line change
Expand Up @@ -1373,6 +1373,7 @@ def test_date_range_like_same_calendar():
assert src is out


@pytest.mark.filterwarnings("ignore:Converting non-nanosecond")
def test_date_range_like_errors():
src = date_range("1899-02-03", periods=20, freq="D", use_cftime=False)
src = src[np.arange(20) != 10] # Remove 1 day so the frequency is not inferable.
Expand Down
1 change: 1 addition & 0 deletions xarray/tests/test_concat.py
Original file line number Diff line number Diff line change
Expand Up @@ -297,6 +297,7 @@ def test_concat_multiple_datasets_with_multiple_missing_variables() -> None:
assert_identical(actual, expected)


@pytest.mark.filterwarnings("ignore:Converting non-nanosecond")
def test_concat_type_of_missing_fill() -> None:
datasets = create_typed_datasets(2, seed=123)
expected1 = concat(datasets, dim="day", fill_value=dtypes.NA)
Expand Down
2 changes: 2 additions & 0 deletions xarray/tests/test_conventions.py
Original file line number Diff line number Diff line change
Expand Up @@ -168,6 +168,7 @@ def test_do_not_overwrite_user_coordinates(self) -> None:
with pytest.raises(ValueError, match=r"'coordinates' found in both attrs"):
conventions.encode_dataset_coordinates(orig)

@pytest.mark.filterwarnings("ignore:Converting non-nanosecond")
def test_emit_coordinates_attribute_in_attrs(self) -> None:
orig = Dataset(
{"a": 1, "b": 1},
Expand All @@ -185,6 +186,7 @@ def test_emit_coordinates_attribute_in_attrs(self) -> None:
assert enc["b"].attrs.get("coordinates") == "t"
assert "coordinates" not in enc["b"].encoding

@pytest.mark.filterwarnings("ignore:Converting non-nanosecond")
def test_emit_coordinates_attribute_in_encoding(self) -> None:
orig = Dataset(
{"a": 1, "b": 1},
Expand Down
Loading