-
Notifications
You must be signed in to change notification settings - Fork 1
Linking Result Formats
ELEVANT can take existing linking results in certain formats and transform them into ELEVANT's internally used format. Supported formats are
Both formats are explained in detail in the following.
If you have linking results for a certain benchmark in NIF format, use -pformat nif
with the link_benchmark.py
script. For example:
python3 link_benchmark.py <experiment_name> -pfile <path_to_linking_results> -pformat nif -pname <linker_name> -b <benchmark_name>
Your linking results file should look something like this:
@prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> .
@prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
<http://www.aksw.org/gerbil/NifWebService/request_0#char=0,87> a nif:Context,
nif:OffsetBasedString ;
nif:beginIndex "0"^^xsd:nonNegativeInteger ;
nif:endIndex "87"^^xsd:nonNegativeInteger ;
nif:isString "Angelina, her father Jon, and her partner Brad never played together in the same movie." .
<http://www.aksw.org/gerbil/NifWebService/request_0#offset_42_46> a nif:OffsetBasedString,
nif:Phrase ;
nif:anchorOf "Brad" ;
nif:beginIndex "42"^^xsd:nonNegativeInteger ;
nif:endIndex "46"^^xsd:nonNegativeInteger ;
nif:referenceContext <http://www.aksw.org/gerbil/NifWebService/request_0#char=0,87> ;
itsrdf:taIdentRef <https://en.wikipedia.org/wiki/Brad_Pitt> .
- Entity identifiers can be either from Wikidata, Wikipedia or DBpedia.
-
<path_to_linking_results>
can be the path to a single NIF file that contains all benchmark articles and the predicted links or the path to a directory that contains multiple such NIF files.
The NIF prediction reader is implemented here.
If you have linking results for a certain benchmark in a very simple JSONL format as described below, use
-pformat simple-jsonl
with the link_benchmark.py
script. For example:
python3 link_benchmark.py <experiment_name> -pfile <path_to_linking_results> -pformat simple-jsonl -pname <linker_name> -b <benchmark_name>
The file <path_to_linking_results>
should contain one line per benchmark article. The order of the predictions
should correspond to the article order of the benchmark in the benchmarks
directory. The linking results file
should look something like this:
{"predictions": [{"entity_reference": "Angelina Jolie", "start_char": 0, "end_char": 8}, {"entity_reference": "Jon Stewart", "start_char": 21, "end_char": 24}, {"entity_reference": "Brad Paisley", "start_char": 42, "end_char": 46}]}
{"predictions": [{"entity_reference": "Heidi", "start_char": 0, "end_char": 5}, {"entity_reference": "Las Vegas", "start_char": 35, "end_char": 40}]}
...
-
entity_reference
is a reference to the predicted entity in one of the knowledge bases [Wikidata, Wikipedia , DBpedia]. The reference is either the complete URI of the entity (e.g. "https://en.wikipedia.org/wiki/Angelina_Jolie") or just the Wikidata QID / Wikipedia title / DBpedia title. Note however, if no complete link is given the knowledge base is inferred from the format of the entity reference and predicted Wikipedia titles that match the regular expressionQ[0-9]+
will be interpreted as Wikidata QIDs. -
start_char
is the character offset of the start of the mention (including) within the article text -
end_char
is the character offset of the end of the mention (excluding) within the article text - Optionally, you can specify a field
candidates
for each prediction that contains a list of candidate entity references that were considered for the mention.
The simple JSONL prediction reader is implemented here.