Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Handling time series/3D data #1102

Open
jeiglsperger opened this issue Oct 29, 2024 · 0 comments
Open

[ENH] Handling time series/3D data #1102

jeiglsperger opened this issue Oct 29, 2024 · 0 comments

Comments

@jeiglsperger
Copy link

Is your feature request related to a problem? Please describe

By default, imblearn can handle 2D data (samples, features). I often work with time series and also try to classify time series. As a result, an imbalance between the classes can also occur. But I can not use the imblearn package as time series are 3-dimensional (e.g. samples, features, sequence_length)

Describe the solution you'd like

I would like to have the option to also pass 3D time series data to the many applications imblearn offers. Currently, I wrote e.g. my ownoversampler, which I present as the "alternatives section". This code can of course be reused by the authors of imblearn for the described enhancement.

Describe alternatives you've considered

def oversample(x_train, y_train):
    slope_types = [x_train[y_train.to_numpy().flatten() == 0], x_train[y_train.to_numpy().flatten() == 1],
                   x_train[y_train.to_numpy().flatten() == 2], x_train[y_train.to_numpy().flatten() == 3],
                   x_train[y_train.to_numpy().flatten() == 4]]

    majority_class_length = max([len(i) for i in slope_types])

    oversampled_x_data = np.empty([1, x_train.shape[1], x_train.shape[2]])
    oversampled_y_data = np.empty([1])

    for slope_number, slope_data in enumerate(slope_types):
        slope_data_length =  len(slope_data)
        while slope_data_length < majority_class_length:
            idx = np.random.choice(np.arange(slope_data.shape[0]))
            drawn_sample = slope_data[idx].reshape(1, slope_data.shape[1], slope_data.shape[2])
            oversampled_x_data = np.concatenate((oversampled_x_data, drawn_sample), axis=0)
            oversampled_y_data = np.concatenate((oversampled_y_data, np.array([slope_number])), axis=0)
            slope_data_length += 1

    oversampled_x_data = oversampled_x_data[1:]
    oversampled_y_data = oversampled_y_data[1:]

    x_train = np.concatenate((x_train, oversampled_x_data), axis=0)
    y_train = pd.DataFrame(np.concatenate((y_train, oversampled_y_data.reshape(len(oversampled_y_data), 1)), axis=0), columns=['label'])

    return x_train, y_train
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant