Datasets

Overview

Name	Suitable Metrics	Description
ARAGOG	ContextRelevance, Faithfulness, Semantic Answer Similarity	A collection of papers from ArXiv covering topics around Transformers and Large Language Models, all in PDF format.
SQuAD 2.0	Answer Exact Match, DocumentMRR, DocumentMAP, DocumentRecall Semantic Answer Similarity	A collection of questions and answers from Wikipedia articles, typically used for training and evaluating models for extractive question-answering tasks.

This dataset is based on the paper Advanced Retrieval Augmented Generation Output Grading (ARAGOG). It's a collection of papers from ArXiv covering topics around Transformers and Large Language Models, all in PDF format.

The dataset contains:

13 PDF papers.
107 questions and answers generated with the assistance of GPT-4, and validated/corrected by humans.

The following metrics can be used:

The SQuAD 1.1 dataset is a collection of questions and answers from Wikipedia articles, and it's typically used for training and evaluating models for extractive question-answering tasks. You can find more about this dataset on the paper SQuAD: 100,000+ Questions for Machine Comprehension of Text and on the official website https://rajpurkar.github.io/SQuAD-explorer/

The dataset contains:

It contains human annotations suitable for the following metrics: