Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add prompts for POS tagging on Universal Dependencies dataset #754

Open
wants to merge 3 commits into
base: eval-hackathon
Choose a base branch
from

Conversation

aakanksha19
Copy link

@aakanksha19 aakanksha19 commented Apr 27, 2022

Six prompts have been created to add part-of-speech tagging on the Universal Dependencies dataset to PromptSource, addressing this GitHub issue: bigscience-workshop/evaluation#24 . These prompts have been created from scratch since we could not find any POS tagging task references. We are using the following output format: the model must produce a sequence of word-tag pairs (e.g., the DET black ADJ sheep NOUN). Right now we are use edit distance as an initial metric, but later on we will look into implementing a more accurate metric that fits this setting.

In addition to these prompts, the universal dependencies dataset has also been added to Huggingface (under aakanksha/udpos) in a prompting-friendly format. To allow this dataset to be visible in promptsource, I have added my user name (aakanksha) to the list of included_users. If there is a better way to do this, please let me know and I can change this.

If there are any issues with the PR, please comment below and I will work on addressing them. Thanks a lot!

@aakanksha19 aakanksha19 changed the title Adding prompts for POS tagging on Universal Dependencies dataset Add prompts for POS tagging on Universal Dependencies dataset Apr 27, 2022
@aakanksha19 aakanksha19 changed the base branch from main to eval-hackathon April 27, 2022 20:45
@awebson
Copy link
Contributor

awebson commented Apr 27, 2022

Thanks for the PR! Your prompts do not indicate no gold target. Please use ||| in your jinja template to separate the input from the target.

Additionally, we will discuss the use of non-natural language prompts in our Eastern Time standup meeting tomorrow (Thursday). I don't think non-expert humans can perform the task of generating every word and its POS tag as your prompts described. Please join our meeting and discuss if you can!

@awebson awebson self-assigned this Apr 27, 2022
@jzf2101
Copy link
Collaborator

jzf2101 commented Jun 27, 2022

@aakanksha19 - there are merge conflicts, could you please resolve?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants