[FT] Rerun evaluations with new metrics based on completions saved in details file #467

JoelNiklaus · 2024-12-19T14:54:13Z

Issue encountered

Rerunning an evaluation with a new metric requires rerunning the entire inference currently, which can be very costly.

Solution/Feature

It would be great, if we could specify a details file containing the predictions and use that to compute more metrics on.

JoelNiklaus · 2024-12-31T11:44:56Z

@clefourrier @NathanHB I am happy to implement this. Do you have suggestions for how to best solve this?

NathanHB · 2025-01-02T11:18:35Z

It would be great ! I think the best way would be to recreate the sample_id_to_response from the details file and run the metric on these.

from the pipeline.py file:

sample_id_to_responses = self._run_model()
self._compute_metrics(sample_id_to_responses)

you would need to inspect what is in sample_id_to_response and try to make it from the details file.

JoelNiklaus · 2025-01-07T01:53:02Z

Great, will try that, thanks Nathan!

JoelNiklaus added the feature request New feature/request label Dec 19, 2024

JoelNiklaus linked a pull request Jan 7, 2025 that will close this issue

Implemented the possibility to load predictions from details files and continue evaluating from there #488

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FT] Rerun evaluations with new metrics based on completions saved in details file #467

[FT] Rerun evaluations with new metrics based on completions saved in details file #467

JoelNiklaus commented Dec 19, 2024

JoelNiklaus commented Dec 31, 2024

NathanHB commented Jan 2, 2025

JoelNiklaus commented Jan 7, 2025

[FT] Rerun evaluations with new metrics based on completions saved in details file #467

[FT] Rerun evaluations with new metrics based on completions saved in details file #467

Comments

JoelNiklaus commented Dec 19, 2024

Issue encountered

Solution/Feature

JoelNiklaus commented Dec 31, 2024

NathanHB commented Jan 2, 2025

JoelNiklaus commented Jan 7, 2025