This Jupyter notebook showcases a Conversation Summarization Model using the Google Pegasus model from the Hugging Face Transformers library. The model is fine-tuned on the SAMSum dataset, which contains dialogues and their corresponding summaries. The goal of the model is to generate concise summaries of chat conversations.
To run this notebook, you need the following dependencies:
- Python 3.x
- Hugging Face Transformers
- Datasets library
- Matplotlib
- Pandas
- NLTK
- Tqdm
You can install the required packages using pip
:
pip install transformers datasets matplotlib pandas nltk tqdm
- Install the required dependencies as mentioned above.
- Clone this repository to your local machine:
git clone https://github.com/your-username/your-repo.git
cd your-repo
-
Open the Jupyter notebook Conversation_Summarization_Model_Google_Pegasus.ipynb in your Jupyter environment.
-
Make sure you have access to a GPU if you want to leverage GPU acceleration during the model training and inference.
-
Run the notebook cells in sequential order. The notebook will guide you through the following steps:
-
Loading the SAMSum dataset.
- Setting up the Google Pegasus model and tokenizer.
- Preprocessing the data and converting it into appropriate input encodings.
- Fine-tuning the model using the Trainer class from the Transformers library.
- Evaluating the model on the validation dataset and tracking the training progress.
- Generating summaries for sample dialogues and displaying the results.
The SAMSum dataset contains chat conversations and their corresponding summaries, making it suitable for training a conversation summarization model. We used the Google Pegasus model, a state-of-the-art transformer-based model for sequence-to-sequence tasks, and fine-tuned it on the SAMSum dataset.
During training, we used the evaluation strategy to compute validation loss. Additionally, we used the ROUGE metric to evaluate the generated summaries against the ground truth summaries for the test dataset.
After training, the model should be capable of generating informative and concise summaries for chat conversations. The notebook will demonstrate the model's performance on sample dialogues and their generated summaries.
I would like to thank Hugging Face for providing the Transformers library, which made it easy to access and fine-tune state-of-the-art language models like Google Pegasus.