ENH: append/concatenate pipelines #793

sitnarf · 2021-02-12T11:05:55Z

Describe the bug

Nested pipelines using make_pipeline raise the exception.

Steps/Code to Reproduce

from imblearn.over_sampling import SMOTE
from imblearn.pipeline import make_pipeline
from sklearn.ensemble import RandomForestClassifier

pipeline = make_pipeline(
    make_pipeline(make_pipeline(SMOTE())),
    make_pipeline(RandomForestClassifier()),
)

Expected Results

No exception is raised.

Actual Results

Exception raised: TypeError: All intermediate steps of the chain should be estimators that implement fit and transform or fit_resample. 'Pipeline(steps=[('pipeline', Pipeline(steps=[('smote', SMOTE())]))])' implements both)

Versions

System:
python: 3.8.6 (default, Sep 25 2020, 09:36:53) [GCC 10.2.0]
executable: /home/sitnarf/.virtualenvs/backend-g-pApWmb/bin/python
machine: Linux-5.8.0-43-generic-x86_64-with-glibc2.32

Python dependencies:
pip: 20.1.1
setuptools: 44.0.0
sklearn: 0.23.2
numpy: 1.19.5
scipy: 1.6.0
Cython: None
pandas: 1.2.1
matplotlib: 3.3.4
joblib: 1.0.0
threadpoolctl: 2.1.0

The text was updated successfully, but these errors were encountered:

chkoar · 2021-02-12T11:17:49Z

I would not considered it as a bug. Probably is a feature request.

sitnarf · 2021-02-12T11:22:54Z

Hi @chkoar, in original sklearn, at least this works (and not with imblearn):

from sklearn.impute import SimpleImputer
from sklearn.ensemble import RandomForestClassifier
from sklearn.pipeline import make_pipeline

pipeline = make_pipeline(
    make_pipeline(SimpleImputer()),
    make_pipeline(RandomForestClassifier()),
)

glemaitre · 2021-02-12T11:32:14Z

duplicate to #787

The reason is that there is an ambiguity whether to call fit_resample or transform here, so this is not a bug.
Scikit-learn does not ambiguity because resampling does not exist.

Having a single sampler within a pipeline is useless. I assume that you have the same needs than in #787 where you would like to combine to resamplers in a pipeline and provide this pipeline in another pipeline.

If your usecase is different then we can decide to rename this issue and see if we can come up with a solution. Otherwise we will close it in favor of #787

glemaitre · 2021-02-12T11:34:51Z

As mentioned by @chkoar, the current workaround could be to use a FunctionSampler: #787 (comment)

sitnarf · 2021-02-12T11:50:53Z

@glemaitre My use case is actually different. I want to flexibly combine pipeline blocks, e.g. append classifiers to a common preprocessing pipeline. I came with this workaround (minimal example):

def get_preprocessing_pipeline():
    return [
        SimpleImputer(),
        RandomOverSampler(),
    ]

pipeline = make_pipeline(
    *get_preprocessing_pipeline(),
    RandomForestClassifier(),
)

However, it is not ideal, as you always have to think where to add the final make_pipeline in a complex codebase with many levels.

chkoar · 2021-02-12T11:58:43Z

@sitnarf The above example works with the current pipeline object

from imblearn.pipeline import make_pipeline
from imblearn.over_sampling import RandomOverSampler
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.impute import SimpleImputer

X, y = load_iris(return_X_y=1)
pipeline = make_pipeline(
    SimpleImputer(), RandomOverSampler(), RandomForestClassifier(),
)
pipeline.fit(X, y)

chkoar · 2021-02-12T12:00:22Z

Hi @chkoar, in original sklearn, at least this works (and not with imblearn)

I understand. I said that it is not a bug since it is a handled exception with a proper message.

glemaitre · 2021-02-12T12:51:16Z

So at the end your use case is to be able to append/concatenate pipeline and not really to nest them. You are using the nesting because scikit-learn does not provide such way to create pipeline either. I think that it would be best to request this feature upstream. Indeed, by inherit this feature then.

FYI: scikit-learn/scikit-learn#16301

sitnarf · 2021-02-12T14:23:59Z

@glemaitre One question, why is it not possible to run both transform and resample using the "nested" pipeline?

chkoar · 2021-02-12T14:32:06Z

@glemaitre One question, why is it not possible to run both transform and resample using the "nested" pipeline?

It is possible. Actually, we chose to go in that direction in order to avoid fishy situations when having nested pipelines or feature unions and a sampler exists in one branch but it does not in another.

Since the validation of pipelines is executed in init we could detect by checking recursively all nested branches. We have discussed it in another issue if I recall correctly.

chkoar · 2021-02-12T14:33:08Z

Actually I said the same here

sitnarf · 2021-02-12T14:39:01Z

@glemaitre My use case is actually different. I want to flexibly combine pipeline blocks, e.g. append classifiers to a common preprocessing pipeline. I came with this workaround (minimal example):
def get_preprocessing_pipeline():
    return [
        SimpleImputer(),
        RandomOverSampler(),
    ]

pipeline = make_pipeline(
    *get_preprocessing_pipeline(),
    RandomForestClassifier(),
)
However, it is not ideal, as you always have to think where to add the final make_pipeline in a complex codebase with many levels.

Another issue with the mentioned workaround is a clumsy way for combining make_pipeline and Pipeline syntax:

def get_preprocessing_pipeline():
    return [
        SimpleImputer(),
        RandomOverSampler(),
    ]

pipeline = Pipeline([
    // ??????
   *get_preprocessing_pipeline(),
   ('classifier', RandomForestClassifier()),
])

glemaitre · 2021-02-12T14:39:03Z

Since the validation of pipelines is executed in init we could detect by checking recursively all nested branches. We have discussed it in another issue if I recall correctly.

Validation in __init__ would go against the scikit-learn API itself. We delay it in fit for compatibility with grid-search.
Pipeline is the only estimator that does that and it is an annoying one :) Going in that direction does not seem wise.

chkoar · 2021-02-12T14:45:04Z

Validation in init would go against the scikit-learn API itself.

In genereal yes. But for pipelines no.

Pipeline is the only estimator that does.

Correct. That's why I said that. In any case we could detect the problem in fit. The intention is the same.

glemaitre · 2021-02-12T14:48:54Z

Another issue with the mentioned workaround is a clumsy way for combining make_pipeline and Pipeline syntax:

To be honest, I still have issues to understand what you try to achieve. This would still be possible to combine pipeline using steps, isn't it?

In [8]: clf1 = make_pipeline(SVC())
In [10]: clf2 = make_pipeline(RandomForestClassifier())
In [13]: preprocessor = make_pipeline(StandardScaler(), SimpleImputer())
In [14]: Pipeline(steps=preprocessor.steps + clf2.steps)
Pipeline(steps=[('standardscaler', StandardScaler()),
                ('simpleimputer', SimpleImputer()),
                ('randomforestclassifier', RandomForestClassifier())])

sitnarf · 2021-02-12T17:14:14Z

@glemaitre Yeah, that could be a nice workaround! I am closing now, since pipeline nesting is currently not well supported even in upstream.

glemaitre changed the title ~~Nested pipelines raise TypeError [BUG]~~ ENH: append/concatenate pipelines Feb 12, 2021

sitnarf closed this as completed Feb 12, 2021

GiuseppeMagazzu mentioned this issue Jan 19, 2022

[ENH] Wrapper to combine any Over Sampler and Under Sampler #787

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: append/concatenate pipelines #793

ENH: append/concatenate pipelines #793

sitnarf commented Feb 12, 2021 •

edited

Loading

chkoar commented Feb 12, 2021

sitnarf commented Feb 12, 2021 •

edited

Loading

glemaitre commented Feb 12, 2021

glemaitre commented Feb 12, 2021

sitnarf commented Feb 12, 2021

chkoar commented Feb 12, 2021

chkoar commented Feb 12, 2021

glemaitre commented Feb 12, 2021

sitnarf commented Feb 12, 2021

chkoar commented Feb 12, 2021

chkoar commented Feb 12, 2021

sitnarf commented Feb 12, 2021 •

edited

Loading

glemaitre commented Feb 12, 2021

chkoar commented Feb 12, 2021

glemaitre commented Feb 12, 2021 •

edited

Loading

sitnarf commented Feb 12, 2021

ENH: append/concatenate pipelines #793

ENH: append/concatenate pipelines #793

Comments

sitnarf commented Feb 12, 2021 • edited Loading

Describe the bug

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

chkoar commented Feb 12, 2021

sitnarf commented Feb 12, 2021 • edited Loading

glemaitre commented Feb 12, 2021

glemaitre commented Feb 12, 2021

sitnarf commented Feb 12, 2021

chkoar commented Feb 12, 2021

chkoar commented Feb 12, 2021

glemaitre commented Feb 12, 2021

sitnarf commented Feb 12, 2021

chkoar commented Feb 12, 2021

chkoar commented Feb 12, 2021

sitnarf commented Feb 12, 2021 • edited Loading

glemaitre commented Feb 12, 2021

chkoar commented Feb 12, 2021

glemaitre commented Feb 12, 2021 • edited Loading

sitnarf commented Feb 12, 2021

sitnarf commented Feb 12, 2021 •

edited

Loading

sitnarf commented Feb 12, 2021 •

edited

Loading

sitnarf commented Feb 12, 2021 •

edited

Loading

glemaitre commented Feb 12, 2021 •

edited

Loading