Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could you share your inference code or detailed results of the subclasses in OCRBench_v2? #45

Open
ileln opened this issue Jan 17, 2025 · 0 comments

Comments

@ileln
Copy link

ileln commented Jan 17, 2025

Dear author, hello. I hope I’m not disturbing you. I believe that the OCRBench_v2 dataset has made a significant contribution to the open source community, and I am truly fond of your work. However, I encountered some issues while reproducing the results from your paper. Specifically, I was unable to replicate the evaluation outcomes of qwenvl-2.5-8B and internvl-8B on OCRBench_v2 for a specific vertical category. I’m unsure if I made a mistake somewhere. Following the code provided in OCRBench, I constructed the corresponding inference code for the models and conducted evaluations based on your eval.py file. The results I obtained, which are possibly lower than those reported in your paper, are as follows for the subclasses:

  • qwenvl2.5-8B
    table parsing en: 0.171
    chart parsing en: 0.003
    document parsing en: 0.341
    formula recognition en: 0.095
    formula recognition cn: 0.025
    table parsing cn: 0.052
    document parsing cn: 0.376
    English Scores:
    element_parsing: 0.152 (Count: 1600)
    Chinese Scores:
    element_parsing: 0.167 (Count: 800)

  • internvl2.5-8B
    table parsing en: 0.520
    chart parsing en: 0.395
    document parsing en: 0.360
    formula recognition en: 0.037
    formula recognition cn: 0.035
    table parsing cn: 0.286
    document parsing cn: 0.308
    English Scores:
    element_parsing: 0.331 (Count: 1600)
    Chinese Scores:
    element_parsing: 0.232 (Count: 800)

Since I couldn’t find detailed results for these subclasses in your paper, I’m uncertain about which vertical category tests might have gone wrong. I was wondering if you would consider publicly sharing these results to assist others in reproducing your work more easily. Alternatively, could you please help me identify any significant issues in the vertical categories of my reported results? I constructed all my inputs using the default settings from the dataset. Could you share your inference code?Are there any steps or considerations that I might have overlooked?

Thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant