Skip to content

rb-balaji/list_price_estimation

Repository files navigation

Manufacturer List Price Estimation and Prediction Model

Introduction

Welcome to the documentation for the Manufacturer List Price Estimation and Prediction Model. This project aims to estimate manufacturer list prices for products not in our database and predict list price changes based on e-commerce data.

Project Overview

Objectives

  • Current Objective: Estimate missing manufacturer list prices using a trained model on existing data.
  • Future Objective: Predict list price changes based on historical e-commerce data.

Data Sources

We scrape data from various retail sites, including:

  • CVS
  • Walgreens
  • Walmart
  • Amazon
  • Costco
  • Sam's Club
  • Pick N Save

Metrics Collected

For each product, we collect:

  • Retail Price
  • Promoted Price
  • Product Size
  • Manufacturer
  • Category
  • Retail Site

Model Development

Key Features

Identifying important features is crucial for the model's accuracy. The features used in our model include:

  • Retail Price
  • Promoted Price
  • Product Size
  • Manufacturer
  • Category
  • Retail Site

Data Preprocessing

To ensure the model performs well, we preprocess the data by:

  • Handling missing values
  • Standardizing or normalizing features
  • Encoding categorical variables

Model Selection

We considered several models, including:

  • Regression Models: Linear Regression, Ridge Regression, Lasso Regression
  • Tree-based Models: Decision Trees, Random Forest, Gradient Boosting
  • Ensemble Methods: Combining multiple models for better performance

Implementation

Code Snippets

Data Loading

import pandas as pd

def load_data(file_path):
    return pd.read_csv(file_path)

Data Preprocessing

def preprocess_data(data):
    # Handling missing 'Count' values
    data['Count'] = data['Count'].apply(lambda x: 0 if pd.isnull(x) or x % 1 != 0 else x)
    return data

Model Training and Prediction

from sklearn.ensemble import RandomForestRegressor
import joblib

def train_model(train_data):
    features = ['Retail_Price', 'Promo_Price', 'Count', 'Manufacturer', 'Category']
    X = pd.get_dummies(train_data[features])
    y = train_data['List_Price']
    
    model = RandomForestRegressor(random_state=42)
    model.fit(X, y)
    
    joblib.dump(model, 'random_forest_model.joblib')
    return model

def predict_list_price(model, predict_data):
    features = ['Retail_Price', 'Promo_Price', 'Count', 'Manufacturer', 'Category']
    X = pd.get_dummies(predict_data[features])
    
    predictions = model.predict(X)
    predict_data['Predicted_List_Price'] = predictions
    predict_data.to_csv('predicted_data.csv', index=False)

Model Accuracy

We achieved the following accuracy metrics on our training data:

  • Mean Absolute Error (MAE): 0.92
  • Mean Squared Error (MSE): 3.30
  • Root Mean Squared Error (RMSE): 1.82
  • R-squared (R2): 0.96

These metrics indicate a highly accurate model, with an R-squared value of 0.96 meaning that 96% of the variance in list prices is explained by the model. The MAE and RMSE values are low, suggesting that the model's predictions are close to the actual values.

Understanding the Metrics

  • MAE: On average, our predictions are off by 0.92 units from the actual list prices.
  • MSE: This metric gives more weight to larger errors, which helps identify any significant discrepancies in the predictions.
  • RMSE: Similar to MSE, but in the same units as the target variable, providing an average error magnitude.
  • R2: Indicates that 96% of the variability in list prices is accounted for by our model, demonstrating its robustness and reliability.

Future Work

  • Data Collection: Continuously collect more data to improve the model.
  • Feature Engineering: Explore additional features to enhance model performance.
  • Model Tuning: Optimize hyperparameters for better accuracy.
  • Prediction of Price Changes: Develop a model to predict list price changes based on historical trends and external factors.

Conclusion

This project demonstrates a successful approach to estimating and predicting manufacturer list prices using machine learning. The high accuracy metrics achieved underscore the effectiveness of the models and preprocessing techniques employed.

For detailed code and further explanations, please refer to the source code repository.

Links to Relevant Documentation

About

list_price_estimation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published