You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Dear researcher, thank you very much for this amazing work!
I'm curious to expand the dataset to other domains, and wish to preserve the annotation guidelines.
In order to minimize the differences and to follow precisely your work, I would have wanted to run the same pre-annotation procedure as you did, would you be fine to share this part more in detail/code?
I do it as a part of a research project of the TrustHLT Group and this can be extremely beneficial to us!
Thank you very much,
Kai.
The text was updated successfully, but these errors were encountered:
Hi Kai,
Sure, here is the Python code we used to pre-annotate the documents from the ECHR. As you can see, it essentially boils down to:
* Running Spacy to get named entities, + a few simple regular expressions to detect codes and dates
* Correcting those entities with a few heuristics, and mapping the 18 Ontonotes categories to the privacy-oriented categories we had defined
The code is really tailored to ECHR documents and their formatting though, so I’m not sure how useful it would be to other domains, apart perhaps for the mapping between Ontonotes NE and the more privacy-oriented categories from TAB.
Pierre
Dear researcher, thank you very much for this amazing work!
I'm curious to expand the dataset to other domains, and wish to preserve the annotation guidelines.
In order to minimize the differences and to follow precisely your work, I would have wanted to run the same pre-annotation procedure as you did, would you be fine to share this part more in detail/code?
I do it as a part of a research project of the TrustHLT Group and this can be extremely beneficial to us!
Thank you very much,
Kai.
The text was updated successfully, but these errors were encountered: