Manufacturer List Price Estimation and Prediction Model

Introduction

Welcome to the documentation for the Manufacturer List Price Estimation and Prediction Model. This project aims to estimate manufacturer list prices for products not in our database and predict list price changes based on e-commerce data.

Project Overview

Objectives

Current Objective: Estimate missing manufacturer list prices using a trained model on existing data.
Future Objective: Predict list price changes based on historical e-commerce data.

Data Sources

We scrape data from various retail sites, including:

CVS
Walgreens
Walmart
Amazon
Costco
Sam's Club
Pick N Save

Metrics Collected

For each product, we collect:

Retail Price
Promoted Price
Product Size
Manufacturer
Category
Retail Site

Model Development

Key Features

Identifying important features is crucial for the model's accuracy. The features used in our model include:

Retail Price
Promoted Price
Product Size
Manufacturer
Category
Retail Site

Data Preprocessing

To ensure the model performs well, we preprocess the data by:

Handling missing values
Standardizing or normalizing features
Encoding categorical variables

Model Selection

We considered several models, including:

Regression Models: Linear Regression, Ridge Regression, Lasso Regression
Tree-based Models: Decision Trees, Random Forest, Gradient Boosting
Ensemble Methods: Combining multiple models for better performance

Implementation

Code Snippets

Data Loading

import pandas as pd

def load_data(file_path):
    return pd.read_csv(file_path)

Data Preprocessing

def preprocess_data(data):
    # Handling missing 'Count' values
    data['Count'] = data['Count'].apply(lambda x: 0 if pd.isnull(x) or x % 1 != 0 else x)
    return data

Model Training and Prediction

from sklearn.ensemble import RandomForestRegressor
import joblib

def train_model(train_data):
    features = ['Retail_Price', 'Promo_Price', 'Count', 'Manufacturer', 'Category']
    X = pd.get_dummies(train_data[features])
    y = train_data['List_Price']
    
    model = RandomForestRegressor(random_state=42)
    model.fit(X, y)
    
    joblib.dump(model, 'random_forest_model.joblib')
    return model

def predict_list_price(model, predict_data):
    features = ['Retail_Price', 'Promo_Price', 'Count', 'Manufacturer', 'Category']
    X = pd.get_dummies(predict_data[features])
    
    predictions = model.predict(X)
    predict_data['Predicted_List_Price'] = predictions
    predict_data.to_csv('predicted_data.csv', index=False)

Model Accuracy

We achieved the following accuracy metrics on our training data:

Mean Absolute Error (MAE): 0.92
Mean Squared Error (MSE): 3.30
Root Mean Squared Error (RMSE): 1.82
R-squared (R2): 0.96

These metrics indicate a highly accurate model, with an R-squared value of 0.96 meaning that 96% of the variance in list prices is explained by the model. The MAE and RMSE values are low, suggesting that the model's predictions are close to the actual values.

Understanding the Metrics

MAE: On average, our predictions are off by 0.92 units from the actual list prices.
MSE: This metric gives more weight to larger errors, which helps identify any significant discrepancies in the predictions.
RMSE: Similar to MSE, but in the same units as the target variable, providing an average error magnitude.
R2: Indicates that 96% of the variability in list prices is accounted for by our model, demonstrating its robustness and reliability.

Future Work

Data Collection: Continuously collect more data to improve the model.
Feature Engineering: Explore additional features to enhance model performance.
Model Tuning: Optimize hyperparameters for better accuracy.
Prediction of Price Changes: Develop a model to predict list price changes based on historical trends and external factors.

Conclusion

This project demonstrates a successful approach to estimating and predicting manufacturer list prices using machine learning. The high accuracy metrics achieved underscore the effectiveness of the models and preprocessing techniques employed.

For detailed code and further explanations, please refer to the source code repository.

Links to Relevant Documentation

Pandas Documentation
Scikit-learn Documentation
Joblib Documentation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Manufacturer List Price Estimation and Prediction Model

Introduction

Project Overview

Objectives

Data Sources

Metrics Collected

Model Development

Key Features

Data Preprocessing

Model Selection

Implementation

Code Snippets

Data Loading

Data Preprocessing

Model Training and Prediction

Model Accuracy

Understanding the Metrics

Future Work

Conclusion

Links to Relevant Documentation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Manufacturer List Price Estimation and Prediction Model

Introduction

Project Overview

Objectives

Data Sources

Metrics Collected

Model Development

Key Features

Data Preprocessing

Model Selection

Implementation

Code Snippets

Data Loading

Data Preprocessing

Model Training and Prediction

Model Accuracy

Understanding the Metrics

Future Work

Conclusion

Links to Relevant Documentation