-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: append/concatenate pipelines #793
Comments
I would not considered it as a bug. Probably is a feature request. |
Hi @chkoar, in original sklearn, at least this works (and not with imblearn):
|
duplicate to #787 The reason is that there is an ambiguity whether to call Having a single sampler within a pipeline is useless. I assume that you have the same needs than in #787 where you would like to combine to resamplers in a pipeline and provide this pipeline in another pipeline. If your usecase is different then we can decide to rename this issue and see if we can come up with a solution. Otherwise we will close it in favor of #787 |
As mentioned by @chkoar, the current workaround could be to use a FunctionSampler: #787 (comment) |
@glemaitre My use case is actually different. I want to flexibly combine pipeline blocks, e.g. append classifiers to a common preprocessing pipeline. I came with this workaround (minimal example):
However, it is not ideal, as you always have to think where to add the final |
@sitnarf The above example works with the current pipeline object from imblearn.pipeline import make_pipeline
from imblearn.over_sampling import RandomOverSampler
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.impute import SimpleImputer
X, y = load_iris(return_X_y=1)
pipeline = make_pipeline(
SimpleImputer(), RandomOverSampler(), RandomForestClassifier(),
)
pipeline.fit(X, y) |
I understand. I said that it is not a bug since it is a handled exception with a proper message. |
So at the end your use case is to be able to append/concatenate pipeline and not really to nest them. You are using the nesting because scikit-learn does not provide such way to create pipeline either. I think that it would be best to request this feature upstream. Indeed, by inherit this feature then. |
@glemaitre One question, why is it not possible to run both transform and resample using the "nested" pipeline? |
It is possible. Actually, we chose to go in that direction in order to avoid fishy situations when having nested pipelines or feature unions and a sampler exists in one branch but it does not in another. Since the validation of pipelines is executed in init we could detect by checking recursively all nested branches. We have discussed it in another issue if I recall correctly. |
Actually I said the same here |
Another issue with the mentioned workaround is a clumsy way for combining def get_preprocessing_pipeline():
return [
SimpleImputer(),
RandomOverSampler(),
]
pipeline = Pipeline([
// ??????
*get_preprocessing_pipeline(),
('classifier', RandomForestClassifier()),
]) |
Validation in |
In genereal yes. But for pipelines no.
Correct. That's why I said that. In any case we could detect the problem in |
To be honest, I still have issues to understand what you try to achieve. This would still be possible to combine pipeline using In [8]: clf1 = make_pipeline(SVC())
In [10]: clf2 = make_pipeline(RandomForestClassifier())
In [13]: preprocessor = make_pipeline(StandardScaler(), SimpleImputer())
In [14]: Pipeline(steps=preprocessor.steps + clf2.steps)
Pipeline(steps=[('standardscaler', StandardScaler()),
('simpleimputer', SimpleImputer()),
('randomforestclassifier', RandomForestClassifier())]) |
@glemaitre Yeah, that could be a nice workaround! I am closing now, since pipeline nesting is currently not well supported even in upstream. |
Describe the bug
Nested pipelines using make_pipeline raise the exception.
Steps/Code to Reproduce
Expected Results
No exception is raised.
Actual Results
Exception raised:
TypeError: All intermediate steps of the chain should be estimators that implement fit and transform or fit_resample. 'Pipeline(steps=[('pipeline', Pipeline(steps=[('smote', SMOTE())]))])' implements both)
Versions
System:
python: 3.8.6 (default, Sep 25 2020, 09:36:53) [GCC 10.2.0]
executable: /home/sitnarf/.virtualenvs/backend-g-pApWmb/bin/python
machine: Linux-5.8.0-43-generic-x86_64-with-glibc2.32
Python dependencies:
pip: 20.1.1
setuptools: 44.0.0
sklearn: 0.23.2
numpy: 1.19.5
scipy: 1.6.0
Cython: None
pandas: 1.2.1
matplotlib: 3.3.4
joblib: 1.0.0
threadpoolctl: 2.1.0
The text was updated successfully, but these errors were encountered: