Summarization of English-Yoruba Tweets with Code-Switches

The goal of this experiment is to summarize tweets with English and Yoruba code switches.

Keywords: Code-Switching, Tweet Summarization, Language Identification, Code-switching Detection, Translation, Natural Language Processing.

The Data

The twitter_data csv file has three columns:

Tweets: Tweets with code-switches.
Eng_source: English source of the tweets.
Summary: Human annotated summary of the tweets.

Requirements

You can find the modules and libraries used in this project in the requirement.txt file. You can also run the code below.

pip install -r requirements.txt

Structure

Data: contains the data file used for this project.
utils: contains the essential functions used for the project.
data_analysis.ipynb: A python notebook that uses the function in the utils to analyse the data used in this project. The results gives information about the data.
data_collection.ipynb: A python notebook that shows you the procedure of collecting tweets from Twitter using the Twitter API and tweepy python library.
quick_start.ipynb: A python notebook that shows a successful run of the project using the quickstart guideline.
main.ipynb and main.py are python notebook and script that utilizes the functions in utils to show the procedure of summarizing tweets with English-Yoruba code switches and the result gotten.

Quickstart Guideline

Clone the repository

git clone https://github.com/gloryodeyemi/COMP_8730_Project.git

Change the directory to the cloned repository folder

%cd .../COMP_8730_Project

Install the needed packages

pip install -r requirements.txt

Run the script

python main.py

Baseline

The Huggingface AutoTrain feature was used to train and evaluate the baseline approach on our dataset. The evaluation metric scores and testing interface can be found here.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contact

Glory Odeyemi is currently undergoing her Master's program in Computer Science, Artificial Intelligence specialization at the University of Windsor, Windsor, ON, Canada. You can connect with her on LinkedIn.

References

Tweepy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Summarization of English-Yoruba Tweets with Code-Switches

The Data

Requirements

Structure

Quickstart Guideline

Baseline

License

Contact

References

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
Data		Data
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
data_analysis.ipynb		data_analysis.ipynb
data_collection.ipynb		data_collection.ipynb
main.ipynb		main.ipynb
main.py		main.py
quick_start.ipynb		quick_start.ipynb
requirements.txt		requirements.txt

License

gloryodeyemi/COMP_8730_Project

Folders and files

Latest commit

History

Repository files navigation

Summarization of English-Yoruba Tweets with Code-Switches

The Data

Requirements

Structure

Quickstart Guideline

Baseline

License

Contact

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages