Skip to content

Latest commit

 

History

History
18 lines (12 loc) · 757 Bytes

README.md

File metadata and controls

18 lines (12 loc) · 757 Bytes

Syntacticus treebank data

Raw annotated data for the treebanks in the Syntacticus collection.

Releases of the collection are hosted on Github.

Data formats

The texts in the collection are available in two formats:

  1. PROIEL XML: These files are the authoritative source files and the only ones that contain all available annotation. They contain the complete morphological, syntactic and information-structure annotation, as well as the complete text, including punctuation, section headers etc. The schema is defined in proiel.xsd.

  2. CoNLL-X format