-
Notifications
You must be signed in to change notification settings - Fork 526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MAINT Introduce use of set_output to output dataframes #683
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very natural way to introduce this. +1.
This is still draft so I did not merge. But feel free to undraft and merge. |
I think we should use |
The global setting raises an We can still set the output to be dataframe when creating the instances in the rest of the notebook, and use new instances with default input for the pipeline. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will make a second check to be sure that we don't have other places where we should be using this type of output.
# %% | ||
data_train_scaled = pd.DataFrame(data_train_scaled, columns=data_train.columns) | ||
scaler = StandardScaler().set_output(transform="pandas") | ||
data_train_scaled = scaler.fit_transform(data_train) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the analysis, I would also some link to the documentation: https://scikit-learn.org/stable/auto_examples/miscellaneous/plot_set_output.html.
I would probably mention that we can set the output of a Pipeline
using the sklearn.set_config
function without going into details but instead providing delegating to the scikit-learn example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment may be relevant for Issue #675.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a suggestion above to add the link right away without waiting for a PR dedicated to address #675.
In the notebook |
Co-authored-by: Guillaume Lemaitre <[email protected]>
+1 for |
Otherwise LGTM. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM again. Just a few more comments. Feel free to merge once the suggested changes are integrated (assuming you agree with those).
# %% | ||
data_train_scaled = pd.DataFrame(data_train_scaled, columns=data_train.columns) | ||
scaler = StandardScaler().set_output(transform="pandas") | ||
data_train_scaled = scaler.fit_transform(data_train) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a suggestion above to add the link right away without waiting for a PR dedicated to address #675.
Co-authored-by: Olivier Grisel <[email protected]>
Pandas output with
set_output
API is available since v 1.2.This PR introduces such a nice feature to the MOOC.