You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Course:
"Complete Machine Learning & Data Science Bootcamp 2023"
Section 12, video 195, "Preprocessing Our Data", In the exercise "Make Predictions on Test Data"
Issue: ValueError is thrown as demonstrated.
# Manually adjust to have auctioneerID_is_missing column
df_test["auctioneerID_is_missing"] = False
df_test.head()
# Make predictions on the test data
test_preds = ideal_model.predict(df_test)
A ValueError occurs:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[75], line 2
1 # Make predictions on the test data
----> 2 test_preds = ideal_model.predict(df_test)
File ~/Documents/code/udemy/udemy_ml_ds_ztm/.venv/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:981, in ForestRegressor.predict(self, X)
979 check_is_fitted(self)
980 # Check data
--> 981 X = self._validate_X_predict(X)
983 # Assign chunk of trees to jobs
984 n_jobs, _, _ = _partition_estimators(self.n_estimators, self.n_jobs)
File ~/Documents/code/udemy/udemy_ml_ds_ztm/.venv/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:602, in BaseForest._validate_X_predict(self, X)
599 """
600 Validate X whenever one tries to predict, apply, predict_proba."""
601 check_is_fitted(self)
--> 602 X = self._validate_data(X, dtype=DTYPE, accept_sparse="csr", reset=False)
603 if issparse(X) and (X.indices.dtype != np.intc or X.indptr.dtype != np.intc):
604 raise ValueError("No support for np.int64 index based sparse matrices")
File ~/Documents/code/udemy/udemy_ml_ds_ztm/.venv/lib/python3.9/site-packages/sklearn/base.py:548, in BaseEstimator._validate_data(self, X, y, reset, validate_separately, **check_params)
483 def _validate_data(
484 self,
485 X="no_validation",
(...)
489 **check_params,
490 ):
491 """Validate input data and set or check the `n_features_in_` attribute.
492
493 Parameters
(...)
546 validated.
547 """
--> 548 self._check_feature_names(X, reset=reset)
550 if y is None and self._get_tags()["requires_y"]:
551 raise ValueError(
552 f"This {self.__class__.__name__} estimator "
553 "requires y to be passed, but the target y is None."
554 )
File ~/Documents/code/udemy/udemy_ml_ds_ztm/.venv/lib/python3.9/site-packages/sklearn/base.py:481, in BaseEstimator._check_feature_names(self, X, reset)
476 if not missing_names and not unexpected_names:
477 message += (
478 "Feature names must be in the same order as they were in fit.\n"
479 )
--> 481 raise ValueError(message)
ValueError: The feature names should match those that were passed during fit.
Feature names must be in the same order as they were in fit.
Tests:
By the error alone, one could assume the error was caused by the addition of the missing column. After a bit of research and troubleshooting, I ran the following tests to determine if they had the same columns, in order.
Course:
"Complete Machine Learning & Data Science Bootcamp 2023"
Section 12, video 195, "Preprocessing Our Data", In the exercise "Make Predictions on Test Data"
Issue:
ValueError
is thrown as demonstrated.A
ValueError
occurs:Tests:
By the error alone, one could assume the error was caused by the addition of the missing column. After a bit of research and troubleshooting, I ran the following tests to determine if they had the same columns, in order.
Solution:
To fix the column order, I had to reindex the test data, based on the columns of the train data
The code was successful, demonstrated by the next following lines in the exercise.
which resulted in:
The text was updated successfully, but these errors were encountered: