You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Dear author, hello. I hope I’m not disturbing you. I believe that the OCRBench_v2 dataset has made a significant contribution to the open source community, and I am truly fond of your work. However, I encountered some issues while reproducing the results from your paper. Specifically, I was unable to replicate the evaluation outcomes of qwenvl-2.5-8B and internvl-8B on OCRBench_v2 for a specific vertical category. I’m unsure if I made a mistake somewhere. Following the code provided in OCRBench, I constructed the corresponding inference code for the models and conducted evaluations based on your eval.py file. The results I obtained, which are possibly lower than those reported in your paper, are as follows for the subclasses:
Since I couldn’t find detailed results for these subclasses in your paper, I’m uncertain about which vertical category tests might have gone wrong. I was wondering if you would consider publicly sharing these results to assist others in reproducing your work more easily. Alternatively, could you please help me identify any significant issues in the vertical categories of my reported results? I constructed all my inputs using the default settings from the dataset. Could you share your inference code?Are there any steps or considerations that I might have overlooked?
Thank you very much!
The text was updated successfully, but these errors were encountered:
Dear author, hello. I hope I’m not disturbing you. I believe that the OCRBench_v2 dataset has made a significant contribution to the open source community, and I am truly fond of your work. However, I encountered some issues while reproducing the results from your paper. Specifically, I was unable to replicate the evaluation outcomes of qwenvl-2.5-8B and internvl-8B on OCRBench_v2 for a specific vertical category. I’m unsure if I made a mistake somewhere. Following the code provided in OCRBench, I constructed the corresponding inference code for the models and conducted evaluations based on your eval.py file. The results I obtained, which are possibly lower than those reported in your paper, are as follows for the subclasses:
qwenvl2.5-8B
table parsing en: 0.171
chart parsing en: 0.003
document parsing en: 0.341
formula recognition en: 0.095
formula recognition cn: 0.025
table parsing cn: 0.052
document parsing cn: 0.376
English Scores:
element_parsing: 0.152 (Count: 1600)
Chinese Scores:
element_parsing: 0.167 (Count: 800)
internvl2.5-8B
table parsing en: 0.520
chart parsing en: 0.395
document parsing en: 0.360
formula recognition en: 0.037
formula recognition cn: 0.035
table parsing cn: 0.286
document parsing cn: 0.308
English Scores:
element_parsing: 0.331 (Count: 1600)
Chinese Scores:
element_parsing: 0.232 (Count: 800)
Since I couldn’t find detailed results for these subclasses in your paper, I’m uncertain about which vertical category tests might have gone wrong. I was wondering if you would consider publicly sharing these results to assist others in reproducing your work more easily. Alternatively, could you please help me identify any significant issues in the vertical categories of my reported results? I constructed all my inputs using the default settings from the dataset. Could you share your inference code?Are there any steps or considerations that I might have overlooked?
Thank you very much!
The text was updated successfully, but these errors were encountered: