This repository contains our code for our submission at Constraint Hindi Hostility detection shared task. It contains 4 parts which needs to performed sequentially as follows:
- Preprocessing: Preprocessing the tweets and extracting emojis, hashtags, etc. It is present in the preprocessing directory.
- Pretraining: Continued pretraining of indicbert on the provided dataset. It is present in the pretraining directory.
- Finetuning: Finetuning the transformer model to downstream classification task. It is present in the finetuning directory
- Result Generation: Genrating the csv files for final submission. It is present in the results_gen directory.
Further information about each part can be found in the respective directories.
Other folders include: