Skip to content

Linking Result Formats

Natalie Prange edited this page Sep 19, 2024 · 1 revision

ELEVANT can take existing linking results in certain formats and transform them into ELEVANT's internally used format. Supported formats are

  1. NIF
  2. A simple JSONL format

Both formats are explained in detail in the following.

NIF

If you have linking results for a certain benchmark in NIF format, use -pformat nif with the link_benchmark.py script. For example:

python3 link_benchmark.py <experiment_name> -pfile <path_to_linking_results> -pformat nif -pname <linker_name> -b <benchmark_name>

Your linking results file should look something like this:

@prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> .
@prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<http://www.aksw.org/gerbil/NifWebService/request_0#char=0,87> a nif:Context,
    nif:OffsetBasedString ;
nif:beginIndex "0"^^xsd:nonNegativeInteger ;
nif:endIndex "87"^^xsd:nonNegativeInteger ;
nif:isString "Angelina, her father Jon, and her partner Brad never played together in the same movie." .

<http://www.aksw.org/gerbil/NifWebService/request_0#offset_42_46> a nif:OffsetBasedString,
        nif:Phrase ;
    nif:anchorOf "Brad" ;
    nif:beginIndex "42"^^xsd:nonNegativeInteger ;
    nif:endIndex "46"^^xsd:nonNegativeInteger ;
    nif:referenceContext <http://www.aksw.org/gerbil/NifWebService/request_0#char=0,87> ;
    itsrdf:taIdentRef <https://en.wikipedia.org/wiki/Brad_Pitt> .
  • Entity identifiers can be either from Wikidata, Wikipedia or DBpedia.
  • <path_to_linking_results> can be the path to a single NIF file that contains all benchmark articles and the predicted links or the path to a directory that contains multiple such NIF files.

The NIF prediction reader is implemented here.

Simple JSONL Format

If you have linking results for a certain benchmark in a very simple JSONL format as described below, use -pformat simple-jsonl with the link_benchmark.py script. For example:

python3 link_benchmark.py <experiment_name> -pfile <path_to_linking_results> -pformat simple-jsonl -pname <linker_name> -b <benchmark_name>

The file <path_to_linking_results> should contain one line per benchmark article. The order of the predictions should correspond to the article order of the benchmark in the benchmarks directory. The linking results file should look something like this:

{"predictions": [{"entity_reference": "Angelina Jolie", "start_char": 0, "end_char": 8}, {"entity_reference": "Jon Stewart", "start_char": 21, "end_char": 24}, {"entity_reference": "Brad Paisley", "start_char": 42, "end_char": 46}]}
{"predictions": [{"entity_reference": "Heidi", "start_char": 0, "end_char": 5}, {"entity_reference": "Las Vegas", "start_char": 35, "end_char": 40}]}
...
  • entity_reference is a reference to the predicted entity in one of the knowledge bases [Wikidata, Wikipedia , DBpedia]. The reference is either the complete URI of the entity (e.g. "https://en.wikipedia.org/wiki/Angelina_Jolie") or just the Wikidata QID / Wikipedia title / DBpedia title. Note however, if no complete link is given the knowledge base is inferred from the format of the entity reference and predicted Wikipedia titles that match the regular expression Q[0-9]+ will be interpreted as Wikidata QIDs.
  • start_char is the character offset of the start of the mention (including) within the article text
  • end_char is the character offset of the end of the mention (excluding) within the article text
  • Optionally, you can specify a field candidates for each prediction that contains a list of candidate entity references that were considered for the mention.

The simple JSONL prediction reader is implemented here.