Authors: Femi Kamau, Monicah Iwagit, Teofilo Gafna, Wendy Mwiti
This project is a part of the Data Science (DSF-FT) Course at Moringa School. The full project description can be found here.
- Overview
- Business Problem
- Data Understanding - The dataset was sourced from Kaggle
- Data Cleaning: Validity, Completeness, Consistency, Uniformity
- Exploratory Data Analysis
- Modeling:
- Preprocessing techniques in NLP
- Building models
- Model validation
- Deployment
With the current technology, almost every individual has an access to internet and there are no restrictions to what one posts. With this, people can obtain news from them and believe that they are legitimate which might not be case. Feeding information from the internet can affect oneself in one way or another. To avoid this, this project aims to analyse data using text classification with NLP to determine whether an article posted is real or not.
- Pandas
- Seaborn
- Scikit-Learn
- NLTK
- Streamlit
From the data set, the project focuses on the text column as the independent and category column as the dependent variable.
Using this project one will be able to tell whether or not an article is legitimate which will improve on how people percieve on things and situations.
The project was deployed using Streamlit. The link to the deployed project can be found here
├── README.md
|
├── .gitignore
|
|── index.ipynb
|
├── models
| └── model.pkl
|
├── demo
| ├── requirements.txt
| └── ml-app.py
|
└── data_preprocessing
├── cities.txt
├── countries.txt
├── months.txt
├── names.txt
├── states.txt
└── week.txt