Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LBL] nan percentile speed up in some cases? #258

Open
njcuk9999 opened this issue Jul 10, 2024 · 0 comments
Open

[LBL] nan percentile speed up in some cases? #258

njcuk9999 opened this issue Jul 10, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@njcuk9999
Copy link
Owner

njcuk9999 commented Jul 10, 2024

Possible speed up for np.nanpercentile function?

Possibly only speeds up in cases where M>>N in an (NxM vector)

`# helper function for nanpercentile
def _zvalue_from_index(arr, ind):
    """
    work around the limitation of np.choose() by employing np.take()
        arr has to be a 3D array
        ind has to be a 2D array containing values for z-indices to take from arr
    See: [http://stackoverflow.com/a/32091712/4169585](https://can01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fstackoverflow.com%2Fa%2F32091712%2F4169585&data=05%7C02%7Cneil.cook%40umontreal.ca%7Ca11ec920b3ec4deb441508dca0df775a%7Cd27eefec2a474be7981e0f8977fa31d8%7C1%7C0%7C638562128855068414%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=4PMY%2B%2BKqT643Ap1RkkLiXdHHsX%2BwPPXD96EhUmGzPl4%3D&reserved=0)
    This is faster and more memory efficient than using the ogrid based solution with fancy indexing.
    """
    # get number of columns and rows
    _, nC, nR = arr.shape
    # get linear indices and extract elements with np.take()
    idx = nC*nR*ind + np.arange(nC*nR).reshape((nC,nR))
    return np.take(arr, idx)
def nan_percentile(arr, q, axis=0):
    if axis == 0:
        arr = arr[None, :, :].copy()
    elif axis == 1:
        arr = arr[None, :, :].T.copy()
    else:  # fall back
        return np.nanpercentile(arr, q, axis)
    # valid (non NaN) observations along the first axis
    valid_obs = np.sum(np.isfinite(arr), axis=0)
    # replace NaN with maximum
    max_val = np.nanmax(arr)
    nans = np.isnan(arr)
    arr[nans] = max_val
    # sort - former NaNs will move to the end
    arr = np.sort(arr, axis=0)
    # loop over requested quantiles
    if isinstance(q, list):
        qs = []
        qs.extend(q)
    else:
        qs = [q]
    if len(qs) < 2:
        quant_arr = np.zeros(shape=(arr.shape[1], arr.shape[2]))
    else:
        quant_arr = np.zeros(shape=(len(qs), arr.shape[1], arr.shape[2]))
    result = []
    for i in range(len(qs)):
        quant = qs[i]
        # desired position as well as floor and ceiling of it
        k_arr = (valid_obs - 1) * (quant / 100.0)
        f_arr = np.floor(k_arr).astype(np.int32)
        c_arr = np.ceil(k_arr).astype(np.int32)
        fc_equal_k_mask = f_arr == c_arr
        # linear interpolation (like numpy percentile) takes the fractional part of desired position
        floor_val = _zvalue_from_index(arr=arr, ind=f_arr) * (c_arr - k_arr)
        ceil_val = _zvalue_from_index(arr=arr, ind=c_arr) * (k_arr - f_arr)
        quant_arr = floor_val + ceil_val
        quant_arr[fc_equal_k_mask] = _zvalue_from_index(arr=arr, ind=k_arr.astype(np.int32))[fc_equal_k_mask]  # if floor == ceiling take floor value
        result.append(quant_arr)
    result = np.array(result).squeeze()
    result[:, np.logical_and.reduce(nans.squeeze())] = np.nan
    return result
``
@njcuk9999 njcuk9999 converted this from a draft issue Jul 10, 2024
@njcuk9999 njcuk9999 added the enhancement New feature or request label Jul 18, 2024
@njcuk9999 njcuk9999 moved this from In Progress to Todo in APERO-LBL Core/Science Jul 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Todo
Development

No branches or pull requests

1 participant