The official docs can be foud here. We used PyTorch on this project, even for the documentation builds.
Development | Status | Feature |
---|---|---|
Baseline | finished |
|
Alternative Label Encoding | not started |
|
Pipeline vs Joint prediction | not started |
|
Architecture Impact | in progress |
|
Pretrained Embeddings | in progress |
|
Error Analysis | finished |
|
For this targeted sentiment analysis, we used a training dataset in Norwegian with corresponding word embeddings.
We will be working with the recently released NoReCfine, a dataset for finegrained sentiment analysis in Norwegian. The texts in the dataset have been annotated with respect to polar expressions, targets and holders of opinion but we will here be focusing on identification of targets and their polarity only. The underlying texts are taken from a corpus of professionally authored reviews from multiple news-sources and across a wide variety of domains, including literature, games, music, products, movies and more.
the word embeddings used are taken from the NLPL datasets, using the Norwegian-Bokmaal CoNLL17 corpus, with a vocabulary size of 1,182,371.
Download this repository:
$ git clone https://github.uio.no/arthurd/wnnlp
The dataset is part of the repository, however you will need to give access to word embeddings.
You can either download the Norwegian-Bokmaal CoNLL17 a.k.a the 58.zip
file from the NLPL website,
or provide them from SAGA server.
Make sure that you decode this file with encoding='latin1
.
$ python baseline.py --NUM_LAYERS number of hidden layers for BiLSTM
--HIDDEN_DIM dimensionality of LSTM layers
--BATCH_SIZE number of examples to include in a batch
--DROPOUT dropout to be applied after embedding layer
--EMBEDDING_DIM dimensionality of embeddings
--EMBEDDINGS location of pretrained embeddings
--TRAIN_EMBEDDINGS whether to train or leave fixed
--LEARNING_RATE learning rate for ADAM optimizer
--EPOCHS number of epochs to train model
The grid search is currently availabel for the BiLSTM
and BiGRU
models.
You can access through their inner parameters (and hyper parameters as well) through the gridsearch.ini
configuration file. This file is divided into multiple sections, corresponding to diverse parameters, and
you will find more information there.
To run the gridsearch algorithm, simply modify the above parameters and run:
$ python gridsearch.py --conf PATH_TO_CONFIGURATION_FILE
To test and evaluate a saved model, use the eval.py
script as follow:
$ python eval.py --model PATH_TO_SAVED_MODEL
--data PATH_TO_EVAL_DATA