-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
regarding dataset prep scripts and audioset splits #3
Comments
Hi there, Can you point me to the code that is related to the "forbidden" classes? Thanks! -Yuan |
Sorry my bad, that was not in your code, but in FSD50K original release: |
That's fine. For your other questions.
No, we don't use a validation set for AudioSet experiments, I think that is a common setting for most papers (e.g.,this paper, see the footnote on page 1237; you can see this in the code of other papers to verify this point) using AudioSet, practically, it is non-trivial to sample a meaningful validation set due to the label co-occurrence. For this reason, we did not report the best model but reported the average performance of the last few model checkpoints during training. The model performance is not sensitive to most hyperparameters and we didn't tune most of hyperparameters in the paper. On FSD50K that does have a validation split, the PSLA methods work equally well. For the missing link, it seems to be a blank link, I cannot remember if I have something on dropbox. I will try to fix that when I have some time. For the "# only apply to the vocal sound data" comment, that might be a mistake, did you see anything wrong with the prepared JSON file on the eval set? FYI, we collected a VocalSound dataset and will release it soon, we did some experiments on combining the FSD50K and VocalSound dataset, that's why you see the comment there, and I might forget to remove it when I clean up and upload the code. If you don't see an issue with the output JSON file, you can safely ignore the comment. -Yuan |
Thanks for the explanation |
Hi,
I couldnt find in any of you recent publications on Audioset how you split the unbalanced (or even balanced) train segments to train and val for hyper parameter tuning. Just to try to replicate your results. also the dropbox link for the PSLA experiments you have listed is down.
On another note regarding FSD50k, could you elaborate what are those "forbidden" classes and why? also could you explain the purpose of this comment in prep_fsd50k.py when generating the JSON files please?:
"# only apply to the vocal sound data"
Thanks
The text was updated successfully, but these errors were encountered: