You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello. I tried using the demo code of Codi (https://github.com/microsoft/i-Code/tree/main/i-Code-V3) to reproduce results on the AudioCaps dataset. However, I was unable to achieve the results reported in the paper for the audio captioning and TTA tasks, with a significant discrepancy in performance:
Hello. I tried using the demo code of Codi (https://github.com/microsoft/i-Code/tree/main/i-Code-V3) to reproduce results on the AudioCaps dataset. However, I was unable to achieve the results reported in the paper for the audio captioning and TTA tasks, with a significant discrepancy in performance:
Frechet Audio Distance: 12.3379363
Kullback-Leibler Divergence (Sigmoid): 9.3400078
Kullback-Leibler Divergence (Softmax): 3.8197691
Inception Score Mean: 2.9589245
Inception Score Std: 0.2177440
Frechet Distance: 54.1079137
Bleu-1: 0.2448
Bleu-2: 0.0918
Bleu-3: 0.0287
Bleu-4: 0.0097
Rouge: 0.1928
CIDEr: 0.0689
METEOR: 0.0877
SPICE: 0.0504
SPIDEr: 0.0596
Here is my code:
the dataset_json is provided by AudioLDM
I would like to ask what the specific issues might be?
The text was updated successfully, but these errors were encountered: