Status

Documentation

The official docs can be foud here. We used PyTorch on this project, even for the documentation builds.

Code

Development	Status	Feature
Baseline	finished	DataLoader Word2Vec BiLSTM
Alternative Label Encoding	not started	BIOUL
Pipeline vs Joint prediction	not started	Pipeline Joint Prediction Comparison
Architecture Impact	in progress	LSTM GRU Character Level Depth
Pretrained Embeddings	in progress	ELMo BERT Multilingual BERT
Error Analysis	finished	Confusion Matrix Common Errors

Norwegian Data

For this targeted sentiment analysis, we used a training dataset in Norwegian with corresponding word embeddings.

NoRec Dataset

We will be working with the recently released NoReCfine, a dataset for finegrained sentiment analysis in Norwegian. The texts in the dataset have been annotated with respect to polar expressions, targets and holders of opinion but we will here be focusing on identification of targets and their polarity only. The underlying texts are taken from a corpus of professionally authored reviews from multiple news-sources and across a wide variety of domains, including literature, games, music, products, movies and more.

NLPL Word Embeddings

the word embeddings used are taken from the NLPL datasets, using the Norwegian-Bokmaal CoNLL17 corpus, with a vocabulary size of 1,182,371.

Getting Started

Set-up

Download this repository:

$ git clone https://github.uio.no/arthurd/wnnlp

The dataset is part of the repository, however you will need to give access to word embeddings. You can either download the Norwegian-Bokmaal CoNLL17 a.k.a the 58.zip file from the NLPL website, or provide them from SAGA server.

Make sure that you decode this file with encoding='latin1.

Baseline

$ python baseline.py --NUM_LAYERS         number of hidden layers for BiLSTM
                     --HIDDEN_DIM         dimensionality of LSTM layers
                     --BATCH_SIZE         number of examples to include in a batch
                     --DROPOUT            dropout to be applied after embedding layer
                     --EMBEDDING_DIM      dimensionality of embeddings
                     --EMBEDDINGS         location of pretrained embeddings
                     --TRAIN_EMBEDDINGS   whether to train or leave fixed
                     --LEARNING_RATE      learning rate for ADAM optimizer
                     --EPOCHS             number of epochs to train model

Grid Search

The grid search is currently availabel for the BiLSTM and BiGRU models. You can access through their inner parameters (and hyper parameters as well) through the gridsearch.ini configuration file. This file is divided into multiple sections, corresponding to diverse parameters, and you will find more information there.

To run the gridsearch algorithm, simply modify the above parameters and run:

$ python gridsearch.py --conf   PATH_TO_CONFIGURATION_FILE

Evaluation

To test and evaluate a saved model, use the eval.py script as follow:

$ python eval.py --model  PATH_TO_SAVED_MODEL
                 --data   PATH_TO_EVAL_DATA

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
docs		docs
img		img
notebooks		notebooks
report		report
scripts		scripts
sentarget		sentarget
.gitignore		.gitignore
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Status

Documentation

Code

Norwegian Data

NoRec Dataset

NLPL Word Embeddings

Getting Started

Set-up

Baseline

Grid Search

Evaluation

About

Releases

Packages

Languages

arthurdjn/targeted-sentiment-analysis

Folders and files

Latest commit

History

Repository files navigation

Status

Documentation

Code

Norwegian Data

NoRec Dataset

NLPL Word Embeddings

Getting Started

Set-up

Baseline

Grid Search

Evaluation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages