[LBL] nan percentile speed up in some cases? #258

njcuk9999 · 2024-07-10T15:50:59Z

Possible speed up for np.nanpercentile function?

Possibly only speeds up in cases where M>>N in an (NxM vector)

`# helper function for nanpercentile
def _zvalue_from_index(arr, ind):
    """
    work around the limitation of np.choose() by employing np.take()
        arr has to be a 3D array
        ind has to be a 2D array containing values for z-indices to take from arr
    See: [http://stackoverflow.com/a/32091712/4169585](https://can01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fstackoverflow.com%2Fa%2F32091712%2F4169585&data=05%7C02%7Cneil.cook%40umontreal.ca%7Ca11ec920b3ec4deb441508dca0df775a%7Cd27eefec2a474be7981e0f8977fa31d8%7C1%7C0%7C638562128855068414%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=4PMY%2B%2BKqT643Ap1RkkLiXdHHsX%2BwPPXD96EhUmGzPl4%3D&reserved=0)
    This is faster and more memory efficient than using the ogrid based solution with fancy indexing.
    """
    # get number of columns and rows
    _, nC, nR = arr.shape
    # get linear indices and extract elements with np.take()
    idx = nC*nR*ind + np.arange(nC*nR).reshape((nC,nR))
    return np.take(arr, idx)
def nan_percentile(arr, q, axis=0):
    if axis == 0:
        arr = arr[None, :, :].copy()
    elif axis == 1:
        arr = arr[None, :, :].T.copy()
    else:  # fall back
        return np.nanpercentile(arr, q, axis)
    # valid (non NaN) observations along the first axis
    valid_obs = np.sum(np.isfinite(arr), axis=0)
    # replace NaN with maximum
    max_val = np.nanmax(arr)
    nans = np.isnan(arr)
    arr[nans] = max_val
    # sort - former NaNs will move to the end
    arr = np.sort(arr, axis=0)
    # loop over requested quantiles
    if isinstance(q, list):
        qs = []
        qs.extend(q)
    else:
        qs = [q]
    if len(qs) < 2:
        quant_arr = np.zeros(shape=(arr.shape[1], arr.shape[2]))
    else:
        quant_arr = np.zeros(shape=(len(qs), arr.shape[1], arr.shape[2]))
    result = []
    for i in range(len(qs)):
        quant = qs[i]
        # desired position as well as floor and ceiling of it
        k_arr = (valid_obs - 1) * (quant / 100.0)
        f_arr = np.floor(k_arr).astype(np.int32)
        c_arr = np.ceil(k_arr).astype(np.int32)
        fc_equal_k_mask = f_arr == c_arr
        # linear interpolation (like numpy percentile) takes the fractional part of desired position
        floor_val = _zvalue_from_index(arr=arr, ind=f_arr) * (c_arr - k_arr)
        ceil_val = _zvalue_from_index(arr=arr, ind=c_arr) * (k_arr - f_arr)
        quant_arr = floor_val + ceil_val
        quant_arr[fc_equal_k_mask] = _zvalue_from_index(arr=arr, ind=k_arr.astype(np.int32))[fc_equal_k_mask]  # if floor == ceiling take floor value
        result.append(quant_arr)
    result = np.array(result).squeeze()
    result[:, np.logical_and.reduce(nans.squeeze())] = np.nan
    return result
``

njcuk9999 added this to APERO-LBL Core/Science Jul 10, 2024

njcuk9999 converted this from a draft issue Jul 10, 2024

njcuk9999 added the enhancement New feature or request label Jul 18, 2024

njcuk9999 moved this from In Progress to Todo in APERO-LBL Core/Science Jul 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LBL] nan percentile speed up in some cases? #258

[LBL] nan percentile speed up in some cases? #258

njcuk9999 commented Jul 10, 2024 •

edited

Loading

[LBL] nan percentile speed up in some cases? #258

[LBL] nan percentile speed up in some cases? #258

Comments

njcuk9999 commented Jul 10, 2024 • edited Loading

njcuk9999 commented Jul 10, 2024 •

edited

Loading