Release v0.5.0 · deepset-ai/FARM

Add Dense Passage Retriever (DPR) incl. Training & Inference (#513, #601, #606)

Happy to introduce a completely new task type to FARM: Text similarity with two separate transformer encoders

Why?
We observe a big shift in Information Retrieval from sparse methods (BM25 etc.) towards dense methods that encode queries and docs as vectors and use vector similarity to retrieve the most similar docs for a certain query. This is not only helpful for document search but also for open-domain Question Answering. Dense methods outperform sparse methods already in many domains and are especially powerful if the matching between query and passage cannot happen via "keywords" but rather relies on semantics / synonyms / context.

What?
One of the most promising methods at the moment is "Dense Passage Retrieval" from Karphukin et al. (https://arxiv.org/abs/2004.04906). In a nutshell, DPR uses one transformer to encode the query and a second transformer to encode the passage. The two encoders project the different texts into the same vector space and are trained jointly on a similarity measure using in-batch-negatives.

How?
We introduce a new class BiAdaptiveModel that has two language models plus a prediction head.
In the case of DPR, this will be one question encoder model and one passage encoder model.
See the new example script dpr_encoder.py for training / fine-tuning a DPR model.
We also have a tight integration in Haystack, where you can use it as a Retriever for open-domain Question Answering.

Refactor conversion from / to Transformers #576

We simplified conversion between FARM <-> Transformers. You can now run:

# Transformers -> FARM
model = Converter.convert_from_transformers("deepset/roberta-base-squad2", device="cpu")

# FARM -> Transformers
transformer_models = Converter.convert_to_transformers(your_adaptive_model)

Note: In case your FARM AdaptiveModel has multiple prediction heads (e.g. 1x NER, 1x Text Classification), the conversion will return a list with two transformer models (both with one head respectively).

Upgrade to Transformers 3.3.1 #579

Transformers 3.3.1 comes with a few new interesting features, incl. support for Retrieval-Augmented Generation (RAG) which can be used to generate answers rather than extracting answers. In contrast to GPT-3, the generation is conditioned on a set of retrieved documents, and is, therefore, more suitable for most QA applications in the industry that rely on a domain corpus.
Thanks to @lalitpagaria, we'll support RAG also in Haystack soon (see deepset-ai/haystack#484)

Details

Question Answering

Improve Speed: Vectorize Question Answering Prediction Head #603
Fix removal of yes no answers #540
Fix QA bug that rejected spans at beginning of passage #564
Added warning about that Natural Questions Inference. #565
Remove loss index from QA PH #589

Other

Catch empty datasets in Inferencer #605
Add option to set evaluation batch size #607
Infer model type from config #600
Fix random behavior when loading ELECTRA models #599
Fix import for Python3.6 #581
Fixed conversion of BertForMaskedLM to transformers #555
Load correct config for DistilBert model #562
Add passages per second calculation to benchmarks #560
Fix batching in ONNX forward pass #559
Add ONNX conversion & Inference #557

Big thanks to all contributors!
@ftesser @lalitpagaria @himanshurawlani @Timoeller @tanaysoni @brandenchan @bogdankostic @kolk @tholor

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.5.0