You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
I am using your method to generate synthetic data for NER, the dataset I use is the conll++ and conll03, but I found that the output data has over 10,000 tokens. Some of them are even given a ner tag.
I hope if you could give me some tips on solving this issue.
The text was updated successfully, but these errors were encountered:
Hi, you can filter the generated data by using some rules, e.g. remove those generated data that have invalid NER tags. You can also use a NER model to filter the generated data. Please refer to Section 2.4 in this paper: https://aclanthology.org/2021.acl-long.453.pdf. To reduce the number of , you can also adjust the criteria to replace the tokens with .
Hi,
I am using your method to generate synthetic data for NER, the dataset I use is the conll++ and conll03, but I found that the output data has over 10,000 tokens. Some of them are even given a ner tag.
I hope if you could give me some tips on solving this issue.
The text was updated successfully, but these errors were encountered: