-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Good strategies for hierarchical classification with many classes #552
Comments
I have a similar use case and was thinking about implementing the method proposed in "A Multi-task Approach to Neural Multi-label Hierarchical Patent Classification Using Transformers" (doi). The paper authors provide a implementation using You could adapt their code to |
Thanks @haukelicht for the reference. I'll have a look. But this is what I have done. I built pairs from datapoints having the same common hierarchical "father". I did it to generate pairs somewhat related (they share the same high level class) but that I know they should be classified differently. These paris are like hard negatives, and make the task to distinguish them harder. Since I built the pairs from the combination of only examples with the same high level class, the final total number of pairs is significantly reduced. Then, fine-tuned a retrieval model (I worked with gte-multilingual-base), followed to train a head with a simple NN. With this approach I was able to achieved a good model evaluation. |
Sounds great, @miguelwon! Can you maybe point me to the class or method you changed/subclassed to change how setfit constructs the pairwise data? |
I didn't use setfit. Since I want such custom setup I did code myself. Is a bit of a mess but I will copy it here just for you to have an idea. Suppose you have a list of dicts in
then to build the pairs I have the following code:
Do the same for the test set and then
And train with:
So, then after this you have a |
Hai @miguelwon, Thanks for this post. It is super relevant. I am trying something similar, however, I seem to be unable to successfully finetune gte to my classes (5k classes with each about 32 examples). Could you perhaps share some more details on how much training time, performance and such? I have the issue that even with that many classes it becomes too resource intensive and difficult to train. Also my evaluation lossess increase or my training loss dont decrease. Is there a reason you use a batch size of 16? I have little experience and seem unable to find some good sources on this issue. Including why cosinesimilarity is an appropriate loss here as well (relative to alternatives). |
I'm working in a hierarchical multi class problem, and if I flat the labels (flat approach) I have about 1193 classes, which perhaps can already be consider a extreme multi classification problem. Furthermore, per class I have less than 10 examples per unique class.
With so many classes, I can't go with pairs for all combination, because it will result in a huge amount of pairs and I'm a bit limit in hardware and time.
Also, since is hierarchical I think it would work better if I privilege pairs with examples with the same "father", because I want to have a good discrimination even between example within the same "father" category.
Do you know any good strategy to this kind of problem? Perhaps train first between some random picked high level hierarchy and then further training with pairs that share the same root?
The text was updated successfully, but these errors were encountered: