You must be signed in to change notification settings - Fork 2
ESWC 2016 Call for Challenge: Semantic Publishing Challenge 2016 ** apologies for cross-posting **
==== Call for Challenge: Semantic Publishing ====
Challenge Website: https://github.com/ceurws/lod/wiki/SemPub2016
Challenge hashtag: #SemPub2016
Challenge Chairs:
- Angelo Di Iorio (Department of Computer Science and Engineering, University of Bologna, IT)
- Anastasia Dimou (Data Science Lab, Ghent University, BE)
- Christoph Lange (Enterprise Information Systems, University of Bonn / Fraunhofer IAIS, DE)
- Sahar Vahdati (Enterprise Information Systems, University of Bonn, DE)
Challenge Coordinator: Stefan Dietze (L3S, Germany) and Anna Tordai (Elsevier, Netherlands)
13th Extended Semantic Web Conference (ESWC) 2016
Dates: May 29th - June 2nd, 2016
Venue: Heraklion, Crete, Greece
Hashtag: #eswc2016
Feed: @eswc_conf
Site: http://2016.eswc-conferences.org
General Chair: Harald Sack (Hasso Plattner Institute (HPI), Germany)
This is the next iteration of the successful Semantic Publishing Challenge of ESWC 2014 and 2015. We continue pursuing the objective of assessing the quality of scientific output, evolving the dataset bootstrapped in 2014 and 2015 to take into account the wider ecosystem of publications. To achieve that, this year’s challenge focuses on refining and enriching an existing linked open dataset about workshops, their publications and their authors. Aspects of “refining and enriching” include extracting deeper information from the HTML and PDF sources of the workshop proceedings volumes and enriching this information with knowledge from existing datasets. Thus, a combination of broadly investigated technologies in the Semantic Web field, such as Information Extraction (IE), Natural Language Processing (NLP), Named Entity Recognition (NER), link discovery, etc., is required to deal with the challenge’s tasks.
The Challenge is open to everyone from industry and academia.
We ask challengers to automatically annotate a set of multi-format input documents and to produce a LOD that fully describes these documents, their context, and relevant parts of their content. The evaluation will consist of evaluating a set of queries against the produced dataset to assess its correctness and completeness. The primary input dataset is the LOD that has been extracted from the CEURWS.org workshop proceedings using the winning extraction tools of the 2014 and 2015 challenges, plus its full original HTML and PDF source documents. In addition, the challenge uses (as linking targets) existing LOD on scholarly publications. The input dataset will be split in two parts: a training dataset and an evaluation dataset, which will disclosed a few days before the submission deadline. Participants will be asked to run their tool on the evaluation dataset and to produce the final Linked Dataset and the output of the queries on that dataset.
The Challenge includes three tasks:
= Task 1: Extraction and assessment of workshop proceedings information in HTML =
Participants are required to extract information from a set of HTML tables of contents published in CEUR-WS.org workshop proceedings. The extracted information is expected to answer queries about the quality of these workshops, for instance by measuring growth, longevity, etc. The task is an extension of the Task 1 of the 2014 and 2015 Challenge: we will reuse the most challenging quality indicators from last year’s challenge, others will be defined more precisely, others will be completely new. Last years’ results, with an F-measure of 0.66 in 2015 and 0.64 in 2014 for the winning solutions, show improvement but there is a lot of room for ameliorating information extraction.
= Task 2: Extracting information from the PDF full text of the papers =
Participants are required to extract information from the textual content of the papers (in PDF). That information should describe the organization of the paper and should provide a deeper understanding of the context in which it was written. In particular, the extracted information is expected to answer queries about the internal organization of sections, tables, figures and about the authors’ affiliations and research institutions, and fundings source. The task mainly requires PDF mining techniques and some NLP processing.
= Task 3: Interlinking =
Participants are required to interlink the CEUR-WS.org linked dataset with relevant datasets already existing in the LOD cloud. Task 3 can be accomplished as an entity interlinking/instance matching task that aims to address both interlinking data from the output of the other tasks as well as interlinking CEUR-WS.org linked dataset to external datasets. Moreover, as triples are generated from different sources and due to different activities, tracking provenance information becomes increasingly important.
In each task, the participants will be asked to refine and extend the initial CEUR-WS.org Linked Open Dataset, by information extraction or link discovery, i.e. they will produce an RDF graph. To validate the RDF graphs produced, a number of queries in natural language will be specified, and their expected results in CSV format. Participants are asked to submit both their dataset and the translation of the input (natural language queries) to work on that dataset. A few days before the deadline, a set of query will be specified and be used for the final evaluation. Participants are asked then to run these queries on their dataset and to submit the produced output in CSV. Precision, recall and F-measure will be calculated by comparing each query’s result set with the expected query result from a gold standard built manually. Participants’ overall performance in a task will be defined as the average F-measure over all queries of the task, with all queries having equal weight. For computing precision and recall, an automated tool developed for the 2015 challenge will be used; this tool will be publicly available during the training phase.
A discussion group is open for participants to ask questions and to receive updates about the challenge: mailto:[email protected]. Participants are invited to subscribe to this group as soon as possible and to communicate their intention to participate. They are also invited to use this channel to discuss problems in the input dataset and to suggest changes.
Participants are required to submit:
- Abstract: no more than 200 words.
- Description: It should explain the details of the automated annotation system, including why the system is innovative, how it uses Semantic Web technology, what features or functions the system provides, what design choices were made and what lessons were learned. The description should also summarize how participants have addressed the evaluation tasks. An outlook towards how the data could be consumed is appreciated but not strictly required. Papers must be submitted in PDF format, following the style of the Springer's Lecture Notes in Computer Science (LNCS) series (http://www.springer.com/computer/lncs/lncs+authors), and not exceeding 12 pages in length. Submissions in RASH format (http://cs.unibo.it/save-sd/rash/documentation/index.html) and Linked Research (https://github.com/csarven/linked-research) are also accepted as long as the final camera-ready version conforms to Springer's requirements.
- The Linked Open Dataset produced by their tool on the evaluation dataset (as a file or as a URL, in Turtle or RDF/XML).
- A set of SPARQL queries that work on that LOD and correspond to the natural language queries provided as input
- The output of these SPARQL queries on the evaluation dataset (in CSV format)
Participants will also be asked to submit their tool (source and/or binaries, or a link these can be downloaded from, or a web service URL) for verification purposes. Further submission instructions will be published on the challenge wiki.
All submissions should be provided via the submission system linked from the homepage.
After a first round of review, the Program Committee and the chairs will select a number of submissions conforming to the challenge requirements that will be invited to present their work. Submissions accepted for presentation will receive constructive reviews from the Program Committee, they will be included in the Springer CCIS series. The selection of the best challenge papers will be published in the Satellite Event proceedings (a separate Springer LNCS Volume) of ESWC2016.
Six winners will be selected. For each task we will select:
- best performing tool, given to the paper which will get the highest score in the evaluation
- most original approach, selected by the Challenge Committee with the reviewing process
- January 20, 2016: Publication of the full description of tasks, rules and queries; publication of the training dataset
- February 28, 2016: Publication of the evaluation tool
- March 11, 2016: Paper submission
- March 31, 2016: Deadline for making remarks to the training dataset and the evaluation tool
- April 8, 2016: Notification and invitation to submit task results;
- April 24, 2016: Conference camera-ready
- May 11, 2016: Publication of the evaluation dataset details
- May 13, 2016: Results submission
- May 29 - June 2, 2016: Challenge days
NOTE: Accepted papers will be included in the Conference USB stick. After the conference, participants will be able to add data about the evaluation and to finalize the camera-ready for the final proceedings.
- Aliaksandr Birukou, Springer Verlag, Heidelberg, Germany
- Lukasz Bolikowski, University of Warsaw, Poland
- Kai Eckert, University of Mannheim, Germany
- Maxim Kolchin, ITMO University, SaintPetersburg, Russia
- Phillip Lord, Newcastle University, UK
- Philipp Mayr, GESIS, Germany
- Jodi Schneider, University of Pittsburgh, USA
- Selver Softic, Graz University of Technology, Austria
- Ruben Verborgh, Ghent university – iMinds
- Michael Wagner, Schloss Dagstuhl, LeibnizZentrum für Informatik, German
We are inviting further members.