-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
should we recommend / illustrate / discuss the use of .nc compression, if not in trajan core, at least in an example? #144
Comments
Yes, this is an annoying thing with xarray. I like examples and maybe there is a good way to do it, and probably there is xarray documentation. I personally use this: https://github.com/gauteh/plz/blob/15300e4237c7071a670b8b7e8e6b101b01cab9b6/plz/xr.py#L72 Then I can do: da.to_netcdf(encoding=plz.xr.nc_cmp(da)) |
nice, yes this is exactly what I had in mind regarding the way to compress :) I can add an example about this! :) The questions is, do we want to have this "just as an example", or as a default in trajan given that trajan is trajectory-focused which fits naturally well (I would be surprised if anyone complains about "per trajectory" variable compression)? What do you think? |
It is a bit tricky to make general, and it will not be the right choice if Trajan is used to generate model output. At least intermediate output. If it can be made generic? |
What about making a wrapper of Btw, sometimes (e.g. for simulated datasets) selecting a subset of time could as relevant as selecting subsets of trajectories. Thus we could have simple options to determine chunking size per dimension. |
I like the idea of I guess we could have several ways to go forward with this:
What do you think? :) |
I think if we make this method, it should forward almost everything to xarray.ds.to_netcdf(). So that we also support everything they support. Either we have a encoding='default' argument which tries to solve this, if not everything is forwarded without modification. We should ideally also support to_zarr. |
actually, sorry I wrote "closes" in the PR a bit too fast; there is still the question with zarr, and / or if this could be made automated at some point :) . re-opening :) . |
I think we can experiment with a to_netcdf and to_zarr method, we should try to stay as close as possible to xarray. |
This issue is motivated by the following: .nc is the file obtained by
.to_netcdf()
, .zip is zipping the .nc file in my file explorer:clearly the .nc I had was not effectively compressed at all...
Should this be discussed in some example, and / or should we provide a "reasonable zipping for our typical use / needs as encountered in trajan"
.to_netcdf()
wrapper, or do you think this is outside the scope of trajan?I guess for example that in our case, that is trajectory-focused, it could be realistic to compress each variable trajectory independently, so that we get good compression factor, and at the same time accessing any variable for one single trajectory would still be fast (ie need only to read and uncompress the compressed chunk that contains only this variable for the corresponding trajectory).
The text was updated successfully, but these errors were encountered: