Add prompts for POS tagging on Universal Dependencies dataset #754
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Six prompts have been created to add part-of-speech tagging on the Universal Dependencies dataset to PromptSource, addressing this GitHub issue: bigscience-workshop/evaluation#24 . These prompts have been created from scratch since we could not find any POS tagging task references. We are using the following output format: the model must produce a sequence of word-tag pairs (e.g., the DET black ADJ sheep NOUN). Right now we are use edit distance as an initial metric, but later on we will look into implementing a more accurate metric that fits this setting.
In addition to these prompts, the universal dependencies dataset has also been added to Huggingface (under aakanksha/udpos) in a prompting-friendly format. To allow this dataset to be visible in promptsource, I have added my user name (aakanksha) to the list of included_users. If there is a better way to do this, please let me know and I can change this.
If there are any issues with the PR, please comment below and I will work on addressing them. Thanks a lot!