-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Welcome to the list_price_estimation wiki!
Project Overview:
Sure, I can include the usage and evaluation of the Linear Regression model in your GitHub Wiki page. Here is the updated content with the Linear Regression model section added:
Welcome to the Manufacturer List Price Estimation and Prediction Model project. This project aims to estimate and predict manufacturer list prices for various products using machine learning models. We collect and analyze data from multiple retail sources to fill in gaps in our database and anticipate future price changes.
- Current Objective: Estimate missing manufacturer list prices using existing data.
- Future Objective: Predict list price changes based on e-commerce data trends over time.
We scrape data from various retail sites, including:
- CVS
- Walgreens
- Walmart
- Amazon
- Costco
- Sam's Club
- Pick N Save
For each product, the following metrics are collected:
- Retail price
- Promoted price
- Product size
- Manufacturer
- Category
- Retail site
-
Promo_Price: Filled with
Retail_Price
if missing. - Category: Filled with the most frequent category.
- Count: Filled with 0 if missing.
- Multiplier: Filled with 0 if missing.
We created interaction features to improve model performance:
-
Retail_Price_Count:
Retail_Price
multiplied byCount
-
Promo_Price_Count:
Promo_Price
multiplied byCount
-
Retail_Price_Multiplier:
Retail_Price
multiplied byMultiplier
-
Promo_Price_Multiplier:
Promo_Price
multiplied byMultiplier
Categorical features such as Manufacturer
and Category
are encoded using OneHotEncoder.
- Linear Regression
- Random Forest Regressor
- XGBoost Regressor
We performed hyperparameter tuning using Grid Search to find the best parameters for our models.
We used Mean Squared Error (MSE) to evaluate model performance.
- Mean Squared Error: 8.56
- Best Parameters: (list best parameters from grid search)
- Mean Squared Error: 3.31 (after adding interaction features)
- Best Parameters: (list best parameters from grid search)
- Mean Squared Error: 4.24
The Random Forest model outperformed both the Linear Regression and XGBoost models with the lowest Mean Squared Error. Below is a summary of the MSE for each model:
- Linear Regression: 8.56
- XGBoost: 4.24
- Random Forest: 3.31
We applied the trained models to predict list prices for new data. The predictions are saved in predictions_rf.csv
and predictions_xgb.csv
.
- Enhanced Feature Engineering: Continue exploring additional features and transformations.
- Model Ensemble: Combine multiple models to improve accuracy.
- Time Series Analysis: Develop models to predict list price changes over time.
- Deployment: Deploy the models to a production environment and monitor performance.
Contributions are welcome! Please fork the repository and submit pull requests.
This structure now includes the evaluation of the Linear Regression model alongside the other models, providing a complete overview of your project's approach and results. Feel free to further customize it based on your needs.