Metric comparison - Colab code - Validation set missing #111

simoneroviaro · 2024-11-03T09:07:20Z

The issue refer to the follow-up Google Colab code of the "Note: Metric Comparison Improvement", by the end of the chapter "Scikit-learn: Creating Machine Learning Models".

In the Colab code, both RandomizedSearchCV and GridSearchCV were applied directly to the training set without an explicit validation set.

Quote "The most important part is they all use the same data splits created using train_test_split() and np.random.seed(42)".

I initially supposed that this was referring to the fact that during the previous lessons, a validation set was created for RandomizedSearchCV, but it was not consistent with the GridSearchCV, where a 80/20 train_test_split was used instead.

This turned out not to be the case in Colab code. Infact both RandomizedSearchCV and GridSearchCV were applied directly to the training set without an explicit validation set.

Couldn't this approach lead to overfitting? Any tuning process based on test set performance indirectly leaks information about the test set into the model selection process.

This is not consistent with the content of the previous lessons where the validation set was exaplained.

Could you please clarify?
Thanks,
Simone.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metric comparison - Colab code - Validation set missing #111

Metric comparison - Colab code - Validation set missing #111

simoneroviaro commented Nov 3, 2024

Metric comparison - Colab code - Validation set missing #111

Metric comparison - Colab code - Validation set missing #111

Comments

simoneroviaro commented Nov 3, 2024