You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The issue refer to the follow-up Google Colab code of the "Note: Metric Comparison Improvement", by the end of the chapter "Scikit-learn: Creating Machine Learning Models".
In the Colab code, both RandomizedSearchCV and GridSearchCV were applied directly to the training set without an explicit validation set.
Quote "The most important part is they all use the same data splits created using train_test_split() and np.random.seed(42)".
I initially supposed that this was referring to the fact that during the previous lessons, a validation set was created for RandomizedSearchCV, but it was not consistent with the GridSearchCV, where a 80/20 train_test_split was used instead.
This turned out not to be the case in Colab code. Infact both RandomizedSearchCV and GridSearchCV were applied directly to the training set without an explicit validation set.
Couldn't this approach lead to overfitting? Any tuning process based on test set performance indirectly leaks information about the test set into the model selection process.
This is not consistent with the content of the previous lessons where the validation set was exaplained.
Could you please clarify?
Thanks,
Simone.
The text was updated successfully, but these errors were encountered:
The issue refer to the follow-up Google Colab code of the "Note: Metric Comparison Improvement", by the end of the chapter "Scikit-learn: Creating Machine Learning Models".
In the Colab code, both RandomizedSearchCV and GridSearchCV were applied directly to the training set without an explicit validation set.
Quote "The most important part is they all use the same data splits created using train_test_split() and np.random.seed(42)".
I initially supposed that this was referring to the fact that during the previous lessons, a validation set was created for RandomizedSearchCV, but it was not consistent with the GridSearchCV, where a 80/20 train_test_split was used instead.
This turned out not to be the case in Colab code. Infact both RandomizedSearchCV and GridSearchCV were applied directly to the training set without an explicit validation set.
This is not consistent with the content of the previous lessons where the validation set was exaplained.
Could you please clarify?
Thanks,
Simone.
The text was updated successfully, but these errors were encountered: