[ENH] Handling time series/3D data #1102

jeiglsperger · 2024-10-29T12:48:00Z

Is your feature request related to a problem? Please describe

By default, imblearn can handle 2D data (samples, features). I often work with time series and also try to classify time series. As a result, an imbalance between the classes can also occur. But I can not use the imblearn package as time series are 3-dimensional (e.g. samples, features, sequence_length)

Describe the solution you'd like

I would like to have the option to also pass 3D time series data to the many applications imblearn offers. Currently, I wrote e.g. my ownoversampler, which I present as the "alternatives section". This code can of course be reused by the authors of imblearn for the described enhancement.

Describe alternatives you've considered

def oversample(x_train, y_train):
    slope_types = [x_train[y_train.to_numpy().flatten() == 0], x_train[y_train.to_numpy().flatten() == 1],
                   x_train[y_train.to_numpy().flatten() == 2], x_train[y_train.to_numpy().flatten() == 3],
                   x_train[y_train.to_numpy().flatten() == 4]]

    majority_class_length = max([len(i) for i in slope_types])

    oversampled_x_data = np.empty([1, x_train.shape[1], x_train.shape[2]])
    oversampled_y_data = np.empty([1])

    for slope_number, slope_data in enumerate(slope_types):
        slope_data_length =  len(slope_data)
        while slope_data_length < majority_class_length:
            idx = np.random.choice(np.arange(slope_data.shape[0]))
            drawn_sample = slope_data[idx].reshape(1, slope_data.shape[1], slope_data.shape[2])
            oversampled_x_data = np.concatenate((oversampled_x_data, drawn_sample), axis=0)
            oversampled_y_data = np.concatenate((oversampled_y_data, np.array([slope_number])), axis=0)
            slope_data_length += 1

    oversampled_x_data = oversampled_x_data[1:]
    oversampled_y_data = oversampled_y_data[1:]

    x_train = np.concatenate((x_train, oversampled_x_data), axis=0)
    y_train = pd.DataFrame(np.concatenate((y_train, oversampled_y_data.reshape(len(oversampled_y_data), 1)), axis=0), columns=['label'])

    return x_train, y_train

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] Handling time series/3D data #1102

[ENH] Handling time series/3D data #1102

jeiglsperger commented Oct 29, 2024

[ENH] Handling time series/3D data #1102

[ENH] Handling time series/3D data #1102

Comments

jeiglsperger commented Oct 29, 2024

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Describe alternatives you've considered