-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
31 file part3advanced ufunc #91
Conversation
Setted 3 Gateway cells as 'Raw' so that users will activate during the lesson. This should make jupyterbook rendering possible for this notebook
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
@@ -2,9 +2,7 @@ | |||
"cells": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe would be nice to use the same approach I used in the notebook for warnings just to be more consistent
Reply via ReviewNB
@@ -2,9 +2,7 @@ | |||
"cells": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you didn't test the time and have numbers I would suggest to add a statement that where you at least make users aware that in some cases apply Ufunc can be slower that the usual approach.
Reply via ReviewNB
@@ -2,9 +2,7 @@ | |||
"cells": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I would suggest to be more clear about what the UFunc does. Below the description from the documentation.
"Apply a vectorized function for unlabeled arrays on xarray objects.
The function will be mapped over the data variable(s) of the input arguments using
xarray’s standard rules for labeled computation, including alignment, broadcasting,
looping over GroupBy/Dataset variables, and merging of coordinates. "
It's not only a matter of mapping over the chunks
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it would be beneficial to add further explanation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this notebook, we 'introduce' apply_ufunc, and in next OpenEO part it comes back to it. So I left it in 'light'. Please go ahead for adding explanation if you have one to make users understand better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To explain apply_ufunc
, it is important to note that it takes a function that works with numpy-like arrays and wraps it to support xarray
objects. In case of a reduction, this means that the original function takes axis
numbers instead of dimension names. "vectorized" itself is not that important (though it does make a difference performance-wise), that's what the vectorize
parameter takes care of.
If you're looking for a more comprehensible explanation, https://tutorial.xarray.dev/advanced/apply_ufunc/simple_numpy_apply_ufunc.html is the place to look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe this?
* here we usexarray.apply_ufunc
to apply the function written for numpy-like arrays to thexarray.DataArray
object. Note that in this case the function is composed of element-wise functions and thus can be directly applied to the underlyingdask
arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, I would use np.isin
instead of comparing all array values with 8
,9
and 3
separately for the mask
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To explain
apply_ufunc
, it is important to note that it takes a function that works with numpy-like arrays and wraps it to supportxarray
objects. In case of a reduction, this means that the original function takesaxis
numbers instead of dimension names. "vectorized" itself is not that important (though it does make a difference performance-wise), that's what thevectorize
parameter takes care of.
for that reason I'm pushing to use a different example but I don't think that we have time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@keewis I will add the resource you suggested at the top
@@ -2,9 +2,7 @@ | |||
"cells": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also change a little bit the scope otherwise could be boring like hell see all the time the same stuff.
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tinaok all ok running the notebook in Pangeo-EOSC, however it seems it fails to build in the Jupyter book.
I'd suggest add a if/else statement that detects if the environment has dask.gateaway installed, otherwise run the dask.distributed.
It should work on EOSC, but I only tested on local environment.
It is on purpose that it does not do 'compute' or 'persist' after apply_ufunc. (We can remind about 'lazyness' of Xarray-Dask there too)
@jdries I hope this update is fine for connecting to your section.