Summarization Exercise

This repository contains code to be used in the L90 assignment on summarization.

Dataset

We provide a subset of the CNN/Daily Mail dataset to use for training and evaluation. The data is represented as JSON, with the following format:

{
    "article" : "The article to be summarized.",
    "summary" : "The desired summary.",
    "greedy_n_best_indices" : "Binary yes/no decisions for each sentence representing whether they should be included in the greedily chosen best extractive summary. Sentences split on periods. Only included for train.greedy_sent.json."
}

Scripts

We provide several scripts to help you get started. To train and predict from an extractive summarizer, run the following (you may want to implement the code in models/extractive_summarizer.py first):

python run_extractive_summarizer.py --eval_data dataset_to_predict_for.json > prediction_file.json

To evaluate your predictions, run the following:

python eval.py --eval_data dataset_to_predict_for.json --pred_data prediction_file.json

Installation

To run our scripts, please run the following commands to install libraries for evaluation and pretty progress bars:

pip install tqdm
pip install rouge_metric

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
encoder		encoder
evaluation		evaluation
processing		processing
summarizers		summarizers
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
eval.py		eval.py
run_summarizers.py		run_summarizers.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Summarization Exercise

Dataset

Scripts

Installation

About

Releases

Packages

Languages

License

mathatter997/NLP_2

Folders and files

Latest commit

History

Repository files navigation

Summarization Exercise

Dataset

Scripts

Installation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages