Skip to content

v0.7.0 QA Model Confidence

Compare
Choose a tag to compare
@julian-risch julian-risch released this 22 Feb 17:44
· 55 commits to master since this release
d3658be

QA Confidence Scores

In response to several requests from the community, we now provide more meaningful confidence scores for the predictions of extractive QA models. #690 #705 @julian-risch @Timoeller @lalitpagaria
To this end, predicted answers got a new attribute called confidence, which is in the range [0,1] and can be calibrated with the probability that a prediction is an exact match. The intuition behind the scores is the following: After calibration, if the average confidence of 100 predictions is 70%, then on average 70 of the predictions will be correct.
The implementation of the calibration uses a technique called temperature scaling. The calibration can be executed on a dev set by running the eval() method in the Evaluator class and setting the parameter calibrate_conf_scores to true. This parameter is false by default as it is still an experimental feature and we continue working on it. The score attribute of predicted answers and their ranking remain unchanged so that the default behavior is unchanged.
An example shows how to calibrate and use the confidence scores.

Misc

Refactor Text pair handling, that also add Text pair regression #713 @Timoeller
Refactor Textsimilarity processor #711 @Timoeller
Refactor Regression and inference processors #702 @Timoeller
Fix NER probabilities #700 @brandenchan
Calculate squad evaluation metrics overall and separately for text answers and no answers #698 @julian-risch
Re-enable test_dpr_modules also for windows #697 @ftesser
Use Path instead of String in ONNXAdaptiveModel #694 @skiran252
Big thanks to all contributors!