-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interoperability with Pandas 2.0 non-nanosecond datetime #7493
Comments
Hi @khider , thanks for raising this. For those of us who haven't tried to use non-nanosecond datetimes before (e.g. me), could you possibly expand a bit more on
specifically, where are errors being thrown from within xarray? And what functions are you referring to as examples? |
we are casting everything back to @spencerkclark knows much more about this, but in any case we're aware of the change and are working it (see e.g. #7441). (To be fair, though, at the moment it is mostly Spencer who's working on it, and he seems to be pretty preoccupied.) |
Thanks for posting this general issue @khider. This is something that has been on my radar for several months and I'm on board with it being great to support (eventually this will likely help cftime support as well). I might hesitate to say that I'm actively working on it yet 😬. Right now, in the time I have available, I'm mostly trying to make sure that xarray's existing functionality does not break under pandas 2.0. Once things are a little more stable in pandas with regard to this new feature my plan is to take a deeper dive into what it will take to adopt in xarray (some aspects might need to be handled delicately). We can plan on using this issue for more discussion. As @keewis notes, xarray currently will cast any non-nanosecond precision |
Hi all, Thank you for looking into this. I was very excited when the array was created from my non-nanosecond datetime index but I couldn't do much manipulations beyond creation. |
Indeed it would be nice if this "just worked" but it may take some time to sort out (sorry that this example initially got your hopes up!). Here what I mean by "address" is continuing to prevent non-nanosecond-precision datetime values from entering xarray through casting to nanosecond precision and raising an informative error if that is not possible. This of course would be temporary until we work through the kinks of enabling such support. In the big picture it is exciting that pandas is doing this in part due to your grant. |
@khider It would be helpful if either you or someone on your team tried to make it work and opened a PR. That would give us a sense of what's needed and might speed it along. It would be an advanced change, but we'd be happy to provide feedback. Adding expected-fail tests would be particularly helpful! |
@dcherian +1. I'm happy to engage with others if they are motivated to start on this earlier. |
I might need some help with the xarray codebase. I use it quite often but never had to dig into its guts. |
@khider we are more than happy to help with digging into the codebase! A reasonable place to start would be just trying the operation you want to perform, and looking through the code for the functions any errors get thrown from. You are also welcome to join our bi-weekly community meetings (there is one tomorrow morning!) or the office hours we run. |
I can block out time to join today's meeting or an upcoming one if it would be helpful. |
I can attend it too. 8:30am PST, correct? |
Great -- I'll plan on joining. That's correct. It is at 8:30 AM PT (#4001). |
Thanks for joining the meeting today @khider. Some potentially relevant places in the code that come to my mind are:
Though as @shoyer says, searching for Some design questions that come to my mind are (but you don't need an answer to these immediately to start working):
|
Thank you! The second point that you raise is what we are concerned about right now as well. So maybe it would be good to try to resolve it. How do you deal with PMIP simulations in terms of calendar? |
Currently in xarray we make the choice based on the calendar attribute associated with the data on disk (following the CF conventions). If the data has a non-standard calendar (or cannot be represented with nanosecond-precision datetime values) then we use cftime; otherwise we use NumPy. Which kind of calendar do PMIP simulations typically use? For some background -- my initial need in this realm came mainly from idealized climate model simulations (e.g. configured to start on 0001-01-01 with a no-leap calendar), so I do not have a ton of experience with paleoclimate research. I would be happy to learn more about your application, however! |
Hi all, I just ran into a really nasty-to-track-down bug in xarray (version 2023.08.0, apologies if this is fixed since) where non-nanosecond datetimes are creeping in via expand_dims. Look at the difference between expand_dims and assign_coords: In [33]: xarray.Dataset().expand_dims({'foo': [np.datetime64('2018-01-01')]})
Out[33]:
<xarray.Dataset>
Dimensions: (foo: 1)
Coordinates:
* foo (foo) datetime64[s] 2018-01-01
Data variables:
*empty*
In [34]: xarray.Dataset().assign_coords({'foo': [np.datetime64('2018-01-01')]})
third_party/py/xarray/core/utils.py:1211: UserWarning: Converting non-nanosecond precision datetime values to nanosecond precision. This behavior can eventually be relaxed in xarray, as it is an artifact from pandas which is now beginning to support non-nanosecond precision values. This warning is caused by passing non-nanosecond np.datetime64 or np.timedelta64 values to the DataArray or Variable constructor; it can be silenced by converting the values to nanosecond precision ahead of time.
third_party/py/xarray/core/utils.py:1211: UserWarning: Converting non-nanosecond precision datetime values to nanosecond precision. This behavior can eventually be relaxed in xarray, as it is an artifact from pandas which is now beginning to support non-nanosecond precision values. This warning is caused by passing non-nanosecond np.datetime64 or np.timedelta64 values to the DataArray or Variable constructor; it can be silenced by converting the values to nanosecond precision ahead of time.
Out[34]:
<xarray.Dataset>
Dimensions: (foo: 1)
Coordinates:
* foo (foo) datetime64[ns] 2018-01-01
Data variables:
*empty* It seems for the time being xarray depends on datetime64[ns] being used everywhere for correct behaviour -- I've seen some very weird data corruption silently happen with datetimes when the wrong datetime64 types are used accidentally due to this bug. So good to be consistent about always enforcing datetime64[ns], for as long as this is the case. |
Agreed, many thanks for the report @mjwillson—we'll have to track down why this slips through in the case of |
@mjwillson I think I tracked down the cause of the |
…on is not fully supported in Xarray. See pydata/xarray#7493 PiperOrigin-RevId: 625092146
…on is not fully supported in Xarray. See pydata/xarray#7493 PiperOrigin-RevId: 625092146
…on is not fully supported in Xarray. See pydata/xarray#7493 PiperOrigin-RevId: 625092146
…on is not fully supported in Xarray. See pydata/xarray#7493 PiperOrigin-RevId: 627130182
With the merge of #9618, xarray should be able to work with non-nanosecond datetime/timedelta resolution ("s", "ms", "us"). Please use latest main for testing and report any problems in dedicated issues with a MCVE. Thanks! |
This is wonderful news @kmuehlbauer - thank you for implementing it! In Pyleoclim we had to freeze our pandas version to 2.1.4 to preserve non-ns dtypes. What was your workaround on the pandas side (where several non-ns issues are still open, apparently)? |
@CommonClimate we also encountered some rough patches, but for the most part things worked as expected. Are there reasons beyond resample failing for certain non-nanosecond times (LinkedEarth/Pyleoclim_util#517, pandas-dev/pandas#57427) that you pinned to 2.1.4? |
Hi @spencerkclark, that is the reason. Our entire pandas-dependent stack works with that version but not the more recent ones, as far as I know. |
Is your feature request related to a problem?
As mentioned in this post on the Pangeo discourse, Pandas 2.0 will fully support non-nanosecond datetime as indices. The motivation for this work was the paleogeosciences; a community who needs to represent time in millions of years. One of the biggest motivator is also to facilitate paleodata - model comparison. Enter xarray!
Below is a snippet of code to create a Pandas Series with a non-nanosecond datetime and export to xarray (this works). However, most of the interesting functionalities of xarray don't seem to support this datetime out-of-box:
To test, you will need the Pandas nightly built:
Describe the solution you'd like
Work towards an integration of the new datetimes with xarray, which will support users beyond the paleoclimate community
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: