You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In table 7, you report the POPE result, which decreased in some sets of experiments(comparing with and without). As your method assigns low weights to contradictory text tokens, an increase of hallucination benchmark metrics is expected in my opinion.
Do you have any comments on this, thank you.
The text was updated successfully, but these errors were encountered:
Hi, I apologize for the delayed reply as I am currently occupied with graduation preparations and related travels.
Thanks for your kind opinion. In my view, the POPE benchmark may not be optimal for evaluating hallucination due to its excessively high scores and minimal variability. Alternative benchmarks may indeed be more suitable for these assessments (for more information, please refer to https://arxiv.org/pdf/2312.00849). After my vacation, I will augment the evaluation results from these related benchmarks if possible.
Hi,nice work.
In table 7, you report the POPE result, which decreased in some sets of experiments(comparing with and without). As your method assigns low weights to contradictory text tokens, an increase of hallucination benchmark metrics is expected in my opinion.
Do you have any comments on this, thank you.
The text was updated successfully, but these errors were encountered: