Dataset Needed for Unified Nigerian Language Detection #4
Labels
dataset needed
Specific requests for datasets to improve models.
good first issue
Good for newcomers
help wanted
Extra attention is needed
We are building a unified language detection model for Padie to classify input text into one of the supported Nigerian languages: English, Pidgin, Yoruba, Igbo, and Hausa. To improve the model’s accuracy and diversity, we need high-quality datasets for these languages.
Your contribution will directly help Padie understand and process Nigerian languages better! 🌟
Dataset Structure
📂 The datasets is organised in the
dataset/language_detection/
directory, with separate JSON files for each language:datasets/language_detection/english.json
datasets/language_detection/pidgin.json
datasets/language_detection/yoruba.json
datasets/language_detection/igbo.json
datasets/language_detection/hausa.json
Each file should contain an array of JSON objects in the following format. For instance,
datasets/language_detection/pidgin.json
:How You Can Contribute
✨ Ways to Help:
Provide Data for Any Language:
label
correctly matches the language of thetext
.Ensure Dataset Quality:
Submission Guidelines:
dataset/language_detection/
.Call to Action
🚀 Contribute Today!
Submit your changes as a pull request or share your dataset here if you're unable to format it. Let’s collaborate to build the best language detection model for Padie! Thank you.
Let’s make Padie exceptional! 🌟
The text was updated successfully, but these errors were encountered: