Official codebase for paper "Error Diversity Matters: An Error-Resistant Ensemble Method for Unsupervised Dependency Parsing."
The repository provides tools to evaluate unsupervised dependency parsing models, aggregate predictions using ensemble techniques, and optimize model selection considering error diversity.
- Evaluation (
eval.py
): Computes performance metrics such as Corpus F1 and Corpus UAS (Unlabeled Attachment Score). - Ensemble Aggregation (
ensemble.py
): Aggregates predictions from multiple models using accuracy-based or unweighted methods. - Model Selection (
model_selection.py
): Selects optimal model subsets for ensemble using strategies based on performance and error diversity metrics.
conda create -n ED4UDP python=3.9
conda activate ED4UDP
while read requirement; do pip install $requirement; done < requirements.txt
python eval.py --ref path/to/gold_file --pred path/to/prediction_file
Outputs CorpusF1 and CorpusUAS metrics.
See ensemble.py
for the example usage of the ensemble()
function. Here’s a breakdown of its arguments, their purposes, and how they affect its behavior:
-
references
(list of lists):- A list of prediction files where each file is represented as a list of dependency trees (attachments).
- Each tree in a list corresponds to the predictions for a particular sentence or data instance.
- This is the main input to the ensemble function, as it aggregates predictions across all the trees.
-
agg
(str, default:acc
):- Specifies the aggregation method to use:
- 'acc': Uses accuracy-based aggregation.
- 'f1': Uses F1-based aggregation.
-
beta
(float, default:1
):- A parameter for F1-based aggregation (Ignored if
agg == 'acc'
). - Adjusts the relative importance of precision and recall in the F1 score.
- A parameter for F1-based aggregation (Ignored if
-
weights
(list of floats, default:None
):- Determines the influence of each model's predictions during aggregation.
- If
None
, all models are given equal weight. - The length of weights must match the number of models in references.
-
parallel
(bool, default:True
):- Controls whether the ensemble computation is performed in parallel using multiprocessing.
-
return_times
(bool, default:False
):- Determines whether the function returns the computation time for each instance:
True
: Returns a tuple (aggregated_predictions, times) where times is a list of computation times for each instance.False
: Returns only the aggregated predictions.
- Cannot be
True
ifparallel=True
, as individual processing times are not tracked in parallel mode.
- Determines whether the function returns the computation time for each instance:
-
progress_bar
(bool, default:True
):- Controls whether a progress bar is displayed during computation.
First, configure parameters in model_selection.py
.
Then,
python model_selection.py