Skip to content

hyewonjng/Metis-Ecommerce

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

download download

More than a five-star rating: E-commerce Customer Review Analysis

Abstract

Understanding customers is important for a business to reflect what customers need and to help increase profits. The goal of this project was to predict and distinguish positive and negative reviews of customers and to analyze what customers complain about. I used Amazon Reviews found on Kaggle. To forecast whether customer reviews were either negatively or positively written, I used Bidirectional LTSM and GRU and achieved .82 accuracy scores. Customers’ complaints are categorized by using an LDA model into 32 topics.

Design

For the sentiment analysis, I predicted whether customer text reviews were positive or negative using deep learning techniques (i.e., RNN). After comparing deep learning models to my baseline model, a GRU model outperformed the baseline model with Random Forest and LTSM model.

Next, an LDA model was used to analyze and summarize what customers complained about products after text preprocessing. TF-IDF and bi-trigram improved the model performance than CountVectorizer with unigram. The number of the topic was decided by looking at the coherence score.

Data

The Amazon Reviews dataset includes a total of 3.6M documents. In this project, I selected a set of 71998 random documents for the sentiment analysis and 5998 documents for topic modeling due to the memory space of my computer CPU.

Tools

  • Tensorflow & Keras
  • Sklearn
  • Spacy
  • Gensim
  • NLTK
  • PyLDAvis
  • Matplotlib
  • WordCloud

Future studies

  • More hyperparameter tuning is necessary (such as optimizer and activation functions) to overcome the overfitting issue and improve the accuracy score

  • Due to the large text data size, I partially analyzed the negative reviews, so it turned out to be only complaints of books and movies. Using the entire dataset may yield different results and interpretations. Thus, tuning hyperparameter and rerunning with the entire data is necessary to see other types of complaints on other product types.

  • In the LDA topic modeling analysis, we did not have which products customers purchased in our dataset. Although it was great to see all at once what customers disliked or how they felt about their product purchased, it would be better to analyze review texts per products so that we know better about negative reviews per each product.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published