-
Notifications
You must be signed in to change notification settings - Fork 220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discrepancy in Model Performance Using HuggingFace Pipeline Utility #134
Comments
hi there, I am not the one who did the HF transplant, but using the eval pipeline in this repo, you should be able to reproduce the exact result. Quick question: where is your eval data from? -Yuan |
Hi thanks for the prompt reply. I also notice that the number of parameters are different between the one from this repo and the one from Huggingface Hub. FYI, I downloaded the AudioSet from this repo. |
This data do not have problem, if you search in the issues, there are people successfully reproduce the result with this version. The problem is likely in your eval pipeline. Which norm (i.e., mean std) did you use for eval? You should use the same as our training norm. why not try out eval? -Yuan |
I believe the Huggingface |
I understand, and believe HF can reach the performance, it might just be a minor thing. I just do not have time to debug as I am managing multiple repos. How about this: https://colab.research.google.com/github/YuanGongND/ast/blob/master/colab/AST_Inference_Demo.ipynb This is a colab for inference using our pipeline. You only need minimal effort to revise it to eval all your samples, and then you should see a mAP with our eval pipeline. You can also record the logits of each sample, then you can compare with the HF one. You can even start from a single sample, see if our colab logits and your HF logits are close enough. And you can start from that point for debugging. -Yuan |
Hi I'm attempting to reproduce the performance metrics of models using HuggingFace's Pipeline utility, but I'm encountering different results. Below is the Python code I used for testing:
The helper functions for the metrics calculations are implemented as follows:
The recorded performance metrics were:
MIT/ast-finetuned-audioset-16-16-0.442
MIT/ast-finetuned-audioset-10-10-0.4593
These results do not align closely with the expected performance. Could you help me identify any potential issues with my approach or provide guidance on achieving the expected performance levels?
The text was updated successfully, but these errors were encountered: