Home

Welcome to the list_price_estimation wiki!

Project Overview:

Sure, I can include the usage and evaluation of the Linear Regression model in your GitHub Wiki page. Here is the updated content with the Linear Regression model section added:

Manufacturer List Price Estimation and Prediction Model

Introduction

Welcome to the Manufacturer List Price Estimation and Prediction Model project. This project aims to estimate and predict manufacturer list prices for various products using machine learning models. We collect and analyze data from multiple retail sources to fill in gaps in our database and anticipate future price changes.

Project Overview

Objectives

Current Objective: Estimate missing manufacturer list prices using existing data.
Future Objective: Predict list price changes based on e-commerce data trends over time.

Data Sources

We scrape data from various retail sites, including:

CVS
Walgreens
Walmart
Amazon
Costco
Sam's Club
Pick N Save

Collected Metrics

For each product, the following metrics are collected:

Retail price
Promoted price
Product size
Manufacturer
Category
Retail site

Data Preprocessing

Handling Missing Data

Promo_Price: Filled with Retail_Price if missing.
Category: Filled with the most frequent category.
Count: Filled with 0 if missing.
Multiplier: Filled with 0 if missing.

Feature Engineering

We created interaction features to improve model performance:

Retail_Price_Count: Retail_Price multiplied by Count
Promo_Price_Count: Promo_Price multiplied by Count
Retail_Price_Multiplier: Retail_Price multiplied by Multiplier
Promo_Price_Multiplier: Promo_Price multiplied by Multiplier

Categorical Encoding

Categorical features such as Manufacturer and Category are encoded using OneHotEncoder.

Model Development

Models Used

Linear Regression
Random Forest Regressor
XGBoost Regressor

Hyperparameter Tuning

We performed hyperparameter tuning using Grid Search to find the best parameters for our models.

Evaluation Metrics

We used Mean Squared Error (MSE) to evaluate model performance.

Results

Linear Regression

Mean Squared Error: 8.56

Random Forest Regressor

Best Parameters: (list best parameters from grid search)
Mean Squared Error: 3.31 (after adding interaction features)

XGBoost Regressor

Best Parameters: (list best parameters from grid search)
Mean Squared Error: 4.24

Comparison of Models

The Random Forest model outperformed both the Linear Regression and XGBoost models with the lowest Mean Squared Error. Below is a summary of the MSE for each model:

Linear Regression: 8.56
XGBoost: 4.24
Random Forest: 3.31

Predictions

We applied the trained models to predict list prices for new data. The predictions are saved in predictions_rf.csv and predictions_xgb.csv.

Future Work

Enhanced Feature Engineering: Continue exploring additional features and transformations.
Model Ensemble: Combine multiple models to improve accuracy.
Time Series Analysis: Develop models to predict list price changes over time.
Deployment: Deploy the models to a production environment and monitor performance.

Contributing

Contributions are welcome! Please fork the repository and submit pull requests.

References

This structure now includes the evaluation of the Linear Regression model alongside the other models, providing a complete overview of your project's approach and results. Feel free to further customize it based on your needs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly