GitHub - Maria-Liakata-NLP-Group/WassOS

Introduction

We developed WassOs, an unsupervised opinion summarization model based on VAE and the Wasserstein barycenter. In order to caputure the main meaning of different kinds of documents, we disentangle the doc- ument distributions into separate semantic and syntactic spaces. We introduce these distributions into the Wasserstein space and construct the summary distribution using the Wasserstein barycenter. This strategy can reduce the mutual interference of semantic and syntactic information, and identify the representative summary distribution from multiple noisy documents. We developed THVAE, an unsupervised timline summarization model based on hierarchical VAE

Installation

Our code is based on the framework of Copycat, please follow this link to build the conda environment.

Data

We experimented on 3 different datasets with different types of content (social media posts, reviews) to allow for a thorough evaluation across different domains. The social media posts are from Twitter and Reddit. The revirews are from Amazon. We experimented on talk-lfe datasets

Input Data Format

The expected format of input is provided in artifacts. The expected format of input is

group_id	review_text	category	review_tag
B000WJ3I1M	I have this cupholder mounted ...	reviews_electronics_5	PRP VBP DT NN VBD IN DT ...

We parse each document into the tag sequence with Zpar.

Strategy

Key phrases

The method of getting key phrases is in file, 'read_timeline'.

We developed two strategies 'T_center' and 'O_center' for this model. For the first strategy 'T_center', It uses the two Wasserstein barycenter from semantic and syntactic spaces to construct the summary distribution, and it performs better on social media posts. The second strategy 'O_center' only uses one Wasserstein barycenter from semantic space and constrcuct the syntactic distribuiton with the method for each document. It performs better on reviews dataset. You can a strategy in model_hp.py based on your data.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
gold_summs		gold_summs
mltoolkit		mltoolkit
preprocessing		preprocessing
transformer		transformer
wassos		wassos
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pre_train_transformer.py		pre_train_transformer.py
requirements.txt		requirements.txt
syntax-vae.yaml		syntax-vae.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Installation

Data

Input Data Format

Strategy

Key phrases

About

Releases

Packages

Languages

License

Maria-Liakata-NLP-Group/WassOS

Folders and files

Latest commit

History

Repository files navigation

Introduction

Installation

Data

Input Data Format

Strategy

Key phrases

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages