Skip to content

Commit

Permalink
Merge branch 'dev-0.4.1'
Browse files Browse the repository at this point in the history
  • Loading branch information
nreimers committed Jan 4, 2021
2 parents 10ed2a8 + 195784b commit de558ab
Show file tree
Hide file tree
Showing 81 changed files with 1,031 additions and 766 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
.idea
.vscode
*.pyc
*.gz
*.tsv
Expand Down
1 change: 1 addition & 0 deletions docs/package_reference/cross_encoder.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ For an introduction to Cross-Encoders, see [Cross-Encoders](../usage/cross-encod
CrossEncoder have their own evaluation classes, that are in `sentence_transformers.cross_encoder.evaluation`.

```eval_rst
.. autoclass:: sentence_transformers.cross_encoder.evaluation.CEBinaryAccuracyEvaluator
.. autoclass:: sentence_transformers.cross_encoder.evaluation.CEBinaryClassificationEvaluator
.. autoclass:: sentence_transformers.cross_encoder.evaluation.CECorrelationEvaluator
.. autoclass:: sentence_transformers.cross_encoder.evaluation.CESoftmaxAccuracyEvaluator
Expand Down
5 changes: 0 additions & 5 deletions docs/package_reference/datasets.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,6 @@
`sentence_transformers.datasets` contains classes to organize your training input examples.


## SentencesDataset
`SentencesDataset` is the main class to store training classes for training. For details, see [training overview](../training/overview.md).
```eval_rst
.. autoclass:: sentence_transformers.datasets.SentencesDataset
```

## ParallelSentencesDataset
`ParallelSentencesDataset` is used for multilingual training. For details, see [multilingual training](../../examples/training/multilingual/README.md).
Expand Down
1 change: 1 addition & 0 deletions docs/package_reference/models.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@

## Further Classes
```eval_rst
.. autoclass:: sentence_transformers.models.Asym
.. autoclass:: sentence_transformers.models.BoW
.. autoclass:: sentence_transformers.models.CNN
.. autoclass:: sentence_transformers.models.LSTM
Expand Down
64 changes: 50 additions & 14 deletions docs/pretrained_cross-encoders.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,41 +7,77 @@ This page lists available **pretrained Cross-Encoders**. Cross-Encoders require

## STSbenchmark
The following models can be used like this:
```
```python
from sentence_transformers import CrossEncoder
model = CrossEncoder('model_name')
scores = model.predict([('Sent A1', 'Sent B1'), ('Sent A2', 'Sent B2')])
```

They return a score 0...1 indicating the semantic similarity of the given sentence pair.
- **sentence-transformers/ce-distilroberta-base-stsb** - STSbenchmark test performance: 87.92
- **sentence-transformers/ce-roberta-base-stsb** - STSbenchmark test performance: 90.17
- **sentence-transformers/ce-roberta-large-stsb** - STSbenchmark test performance: 91.47
- **cross-encoder/stsb-TinyBERT-L-4** - STSbenchmark test performance: 85.50
- **cross-encoder/stsb-distilroberta-base** - STSbenchmark test performance: 87.92
- **cross-encoder/stsb-roberta-base** - STSbenchmark test performance: 90.17
- **cross-encoder/stsb-roberta-large** - STSbenchmark test performance: 91.47

## Quora Duplicate Questions
These models have been trained on the [Quora duplicate questions dataset](https://www.quora.com/q/quoradata/First-Quora-Dataset-Release-Question-Pairs). They can used like the STSb models and give a score 0...1 indicating the probability that two questions are duplicate questions.

- **sentence-transformers/ce-distilroberta-base-quora** - Average Precision dev set: 87.48
- **sentence-transformers/ce-roberta-base-quora** - Average Precision dev set: 87.80
- **sentence-transformers/ce-roberta-large-quora** - Average Precision dev set: 87.91
- **cross-encoder/quora-distilroberta-base** - Average Precision dev set: 87.48
- **cross-encoder/quora-roberta-base** - Average Precision dev set: 87.80
- **cross-encoder/quora-roberta-large** - Average Precision dev set: 87.91

Note: The model don't work for question similarity. The question *How to learn Java* and *How to learn Python* will get a low score, as these questions are not duplicates. For question similarity, the respective bi-encoder trained on the Quora dataset yields much more meaningful results.

## Information Retrieval

The following models are trained for Information Retrieval: Given a query (like key-words or a question), and a paragraph, can the query be answered by the paragraph? The models have beend trained on MS Marco, a large dataset with real-user queries from Bing search engine.

The models can be used like this:
```
```python
from sentence_transformers import CrossEncoder
model = CrossEncoder('model_name', max_length=512)
scores = model.predict([('Query', 'Paragraph1'), ('Query', 'Paragraph2')])
scores = model.predict([('Query1', 'Paragraph1'), ('Query2', 'Paragraph2')])

#For Example
scores = model.predict([('How many people live in Berlin?', 'Berlin had a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.'),
('What is the size of New York?', 'New York City is famous for the Metropolitan Museum of Art.')])
```

This returns a score 0...1 indicating if the paragraph is relevant for a given query.

- **sentence-transformers/ce-ms-marco-TinyBERT-L-2** - MRR@10 on MS Marco Dev Set: 30.15
- **sentence-transformers/ce-ms-marco-TinyBERT-L-4** - MRR@10 on MS Marco Dev Set: 34.50
- **sentence-transformers/ce-ms-marco-TinyBERT-L-6** - MRR@10 on MS Marco Dev Set: 36.13
- **sentence-transformers/ce-ms-marco-electra-base** - MRR@10 on MS Marco Dev Set: 36.41

For details on the usage, see [Applications - Information Retrieval](../examples/applications/information-retrieval/README.md)
For details on the usage, see [Applications - Information Retrieval](../examples/applications/information-retrieval/README.md)


### MS MARCO
[MS MARCO Passage Retrieval](https://github.com/microsoft/MSMARCO-Passage-Ranking) is a large dataset with real user queries from Bing search engine with annotated relevant text passages.
- **cross-encoder/ms-marco-TinyBERT-L-2** - MRR@10 on MS Marco Dev Set: 30.15
- **cross-encoder/ms-marco-TinyBERT-L-4** - MRR@10 on MS Marco Dev Set: 34.50
- **cross-encoder/ms-marco-TinyBERT-L-6** - MRR@10 on MS Marco Dev Set: 36.13
- **cross-encoder/ms-marco-electra-base** - MRR@10 on MS Marco Dev Set: 36.41

### SQuAD (QNLI)

QNLI is based on the [SQuAD dataset](https://rajpurkar.github.io/SQuAD-explorer/) and was introduced by the [GLUE Benchmar](https://arxiv.org/abs/1804.07461). Given a passage from Wikipedia, annotators created questions that are answerable by that passage.

- **cross-encoder/qnli-distilroberta-base** - Accuracy on QNLI dev set: 90.96
- **cross-encoder/qnli-electra-base** - Accuracy on QNLI dev set: 93.21



## NLI
Given two sentences, are these contradicting each other, entailing one the other or are these netural? The following models were trained on the [SNLI](https://nlp.stanford.edu/projects/snli/) and [MultiNLI](https://cims.nyu.edu/~sbowman/multinli/) datasets.
- **cross-encoder/nli-distilroberta-base** - Accuracy on MNLI mismatched set: 83.98
- **cross-encoder/nli-roberta-base** - Accuracy on MNLI mismatched set: 87.47
- **cross-encoder/nli-deberta-base** - Accuracy on MNLI mismatched set: 88.08

```python
from sentence_transformers import CrossEncoder
model = CrossEncoder('model_name')
scores = model.predict([('A man is eating pizza', 'A man eats something'), ('A black race car starts up in front of a crowd of people.', 'A man is driving down a lonely road.')])

#Convert scores to labels
label_mapping = ['contradiction', 'entailment', 'neutral']
labels = [label_mapping[score_max] for score_max in scores.argmax(axis=1)]
```

20 changes: 7 additions & 13 deletions docs/training/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,19 +57,16 @@ For all available building blocks see [» Models Package Reference](../package_r
To represent our training data, we use the `InputExample` class to store training examples. As parameters, it accepts texts, which is a list of strings representing our pairs (or triplets). Further, we can also pass a label (either float or int). The following shows a simple example, where we pass text pairs to `InputExample` together with a label indicating the semantic similarity.

```python
from sentence_transformers import SentenceTransformer, SentencesDataset, InputExample
from sentence_transformers import SentenceTransformer, InputExample
from torch.utils.data import DataLoader

model = SentenceTransformer('distilbert-base-nli-mean-tokens')
train_examples = [InputExample(texts=['My first sentence', 'My second sentence'], label=0.8),
InputExample(texts=['Another pair', 'Unrelated sentence'], label=0.3)]
train_dataset = SentencesDataset(train_examples, model)
train_dataloader = DataLoader(train_dataset, shuffle=True, batch_size=16)
train_dataloader = DataLoader(train_examples, shuffle=True, batch_size=16)
```

To prepare the examples for training, we provide a custom `SentencesDataset`, which is a [custom PyTorch dataset](https://pytorch.org/tutorials/beginner/data_loading_tutorial.html). It accepts as parameters the list with `InputExamples` and the `SentenceTransformer` model.

We can wrap `SentencesDataset` with the standard PyTorch `DataLoader`, which produces for example batches and allows us to shuffle the data for training.
We wrap our `train_examples` with the standard PyTorch `DataLoader`, which shuffles our data and produces batches of certain sizes.



Expand All @@ -92,7 +89,7 @@ For each sentence pair, we pass sentence A and sentence B through our network wh

A minimal example with `CosineSimilarityLoss` is the following:
```python
from sentence_transformers import SentenceTransformer, SentencesDataset, InputExample, losses
from sentence_transformers import SentenceTransformer, InputExample, losses
from torch.utils.data import DataLoader

#Define the model. Either from scratch of by loading a pre-trained model
Expand All @@ -103,8 +100,7 @@ train_examples = [InputExample(texts=['My first sentence', 'My second sentence']
InputExample(texts=['Another pair', 'Unrelated sentence'], label=0.3)]

#Define your train dataset, the dataloader and the train loss
train_dataset = SentencesDataset(train_examples, model)
train_dataloader = DataLoader(train_dataset, shuffle=True, batch_size=16)
train_dataloader = DataLoader(train_examples, shuffle=True, batch_size=16)
train_loss = losses.CosineSimilarityLoss(model)

#Tune the model
Expand Down Expand Up @@ -142,7 +138,7 @@ model.fit(train_objectives=[(train_dataloader, train_loss)], epochs=1, warmup_st


### Continue Training on Other Data
[training_stsbenchmark_continue_training.py](https://github.com/UKPLab/sentence-transformers/blob/master/examples/training_transformers/training_stsbenchmark_continue_training.py) shows an example where training on a fine-tuned model is continued. In that example, we use a sentence transformer model that was first fine-tuned on the NLI dataset and then continue training on the training data from the STS benchmark.
[training_stsbenchmark_continue_training.py](https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/sts/training_stsbenchmark_continue_training.py) shows an example where training on a fine-tuned model is continued. In that example, we use a sentence transformer model that was first fine-tuned on the NLI dataset and then continue training on the training data from the STS benchmark.

First, we load a pre-trained model from the server:
```python
Expand All @@ -152,9 +148,7 @@ model = SentenceTransformer('bert-base-nli-mean-tokens')

The next steps are as before. We specify training and dev data:
```python
sts_reader = STSBenchmarkDataReader('datasets/stsbenchmark', normalize_scores=True)
train_data = SentencesDataset(sts_reader.get_examples('sts-train.csv'), model)
train_dataloader = DataLoader(train_data, shuffle=True, batch_size=train_batch_size)
train_dataloader = DataLoader(train_samples, shuffle=True, batch_size=train_batch_size)
train_loss = losses.CosineSimilarityLoss(model=model)

evaluator = EmbeddingSimilarityEvaluator.from_input_examples(sts_reader.get_examples('sts-dev.csv'))
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@


# Load pre-trained Sentence Transformer Model (based on DistilBERT). It will be downloaded automatically
model = SentenceTransformer('paraphrase-distilroberta-base-v1')
model = SentenceTransformer('average_word_embeddings_glove.6B.300d')

# Embed a list of sentences
sentences = ['This framework generates embeddings for each input sentence',
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@

# To refine the results, we use a CrossEncoder. A CrossEncoder gets both inputs (input_question, retrieved_question)
# and outputs a score 0...1 indicating the similarity.
cross_encoder_model = CrossEncoder('sentence-transformers/ce-roberta-base-stsb')
cross_encoder_model = CrossEncoder('cross-encoder/roberta-base-stsb')

# Dataset we want to use
url = "http://qim.fs.quoracdn.net/quora_duplicate_questions.tsv"
Expand Down
2 changes: 1 addition & 1 deletion examples/applications/cross-encoder/cross-encoder_usage.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
import numpy as np

# Pre-trained cross encoder
model = CrossEncoder('sentence-transformers/ce-distilroberta-base-stsb')
model = CrossEncoder('cross-encoder/distilroberta-base-stsb')

# We want to compute the similarity between the query sentence
query = 'A man is eating pasta.'
Expand Down
8 changes: 4 additions & 4 deletions examples/applications/information-retrieval/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,10 +78,10 @@ In the following table, we provide various pre-trained Cross-Encoders together w

| Model-Name | NDCG@10 (TREC DL 19) | MRR@10 (MS Marco Dev) | Docs / Sec (BertTokenizerFast) | Docs / Sec |
| ------------- |:-------------| -----| --- | --- |
| sentence-transformers/ce-ms-marco-TinyBERT-L-2 | 67.43 | 30.15 | 9000 | 780
| sentence-transformers/ce-ms-marco-TinyBERT-L-4 | 68.09 | 34.50 | 2900 | 760
| sentence-transformers/ce-ms-marco-TinyBERT-L-6 | 69.57 | 36.13 | 680 | 660
| sentence-transformers/ce-ms-marco-electra-base | 71.99 | 36.41 | 340 | 340
| cross-encoder/ms-marco-TinyBERT-L-2 | 67.43 | 30.15 | 9000 | 780
| cross-encoder/ms-marco-TinyBERT-L-4 | 68.09 | 34.50 | 2900 | 760
| cross-encoder/ms-marco-TinyBERT-L-6 | 69.57 | 36.13 | 680 | 660
| cross-encoder/ms-marco-electra-base | 71.99 | 36.41 | 340 | 340
| *Other models* | | | |
| nboost/pt-tinybert-msmarco | 63.63 | 28.80 | 2900 | 760
| nboost/pt-bert-base-uncased-msmarco | 70.94 | 34.75 | 340 | 340|
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
The CrossEncoder takes the search query and scores every passage how relevant the passage is for the given score. The five passages with the highest score are then returned.
As CrossEncoder, we use sentence-transformers/ce-ms-marco-TinyBERT-L-2, a BERT model with only 2 layers trained on the MS MARCO dataset. This is an extremely quick model able to score up to 9000 passages per second (on a V100 GPU). You can also use a larger model, which gives better results but is also slower.
As CrossEncoder, we use cross-encoder/ms-marco-TinyBERT-L-2, a BERT model with only 2 layers trained on the MS MARCO dataset. This is an extremely quick model able to score up to 9000 passages per second (on a V100 GPU). You can also use a larger model, which gives better results but is also slower.
Note: As we score the [query, passage]-pair for every new query, this search method
becomes at some point in-efficient if the document gets too large.
Expand Down Expand Up @@ -61,7 +61,7 @@


## Load our cross-encoder. Use fast tokenizer to speed up the tokenization
model = CrossEncoder('sentence-transformers/ce-ms-marco-TinyBERT-L-2')
model = CrossEncoder('cross-encoder/ms-marco-TinyBERT-L-2')

## Some queries we want to search for in the document
queries = ["How large is Europe?",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
For semantic search, we use SentenceTransformer('msmarco-distilroberta-base-v2') and retrieve
100 potentially passages that answer the input query.
Next, we use a more powerful CrossEncoder (cross_encoder = CrossEncoder('sentence-transformers/ce-ms-marco-TinyBERT-L-6')) that
Next, we use a more powerful CrossEncoder (cross_encoder = CrossEncoder('cross-encoder/ms-marco-TinyBERT-L-6')) that
scores the query and all retrieved passages for their relevancy. The cross-encoder is neccessary to filter out certain noise
that might be retrieved from the semantic search step.
"""
Expand All @@ -22,7 +22,7 @@
top_k = 100 #Number of passages we want to retrieve with the bi-encoder

#The bi-encoder will retrieve 100 documents. We use a cross-encoder, to re-rank the results list to improve the quality
cross_encoder = CrossEncoder('sentence-transformers/ce-ms-marco-TinyBERT-L-6')
cross_encoder = CrossEncoder('cross-encoder/ms-marco-TinyBERT-L-6')

# As dataset, we use Simple English Wikipedia. Compared to the full English wikipedia, it has only
# about 170k articles. We split these articles into paragraphs and encode them with the bi-encoder
Expand Down
2 changes: 1 addition & 1 deletion examples/evaluation/evaluation_stsbenchmark.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
python evaluation_stsbenchmark.py model_name
"""
from torch.utils.data import DataLoader
from sentence_transformers import SentenceTransformer, SentencesDataset, LoggingHandler
from sentence_transformers import SentenceTransformer, LoggingHandler
from sentence_transformers.evaluation import EmbeddingSimilarityEvaluator
from sentence_transformers.readers import STSBenchmarkDataReader
import logging
Expand Down
2 changes: 1 addition & 1 deletion examples/evaluation/evaluation_stsbenchmark_sbert-wk.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
Hence, WKPooling runs on the GPU, which makes it rather in-efficient.
"""
from torch.utils.data import DataLoader
from sentence_transformers import SentenceTransformer, SentencesDataset, LoggingHandler, models
from sentence_transformers import SentenceTransformer, LoggingHandler, models
from sentence_transformers.evaluation import EmbeddingSimilarityEvaluator
from sentence_transformers.readers import STSBenchmarkDataReader
import logging
Expand Down
4 changes: 3 additions & 1 deletion examples/evaluation/evaluation_translation_matching.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,8 @@
level=logging.INFO,
handlers=[LoggingHandler()])

logger = logging.getLogger(__name__)

model_name = sys.argv[1]
filepaths = sys.argv[2:]
inference_batch_size = 32
Expand All @@ -51,7 +53,7 @@
src_sentences.append(splits[0])
trg_sentences.append(splits[1])

logging.info(os.path.basename(filepath)+": "+str(len(src_sentences))+" sentence pairs")
logger.info(os.path.basename(filepath)+": "+str(len(src_sentences))+" sentence pairs")
dev_trans_acc = evaluation.TranslationEvaluator(src_sentences, trg_sentences, name=os.path.basename(filepath), batch_size=inference_batch_size)
dev_trans_acc(model)

Expand Down
Loading

0 comments on commit de558ab

Please sign in to comment.