Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: relax recall thresholds for CI #3282

Closed

Conversation

westonpace
Copy link
Contributor

I am concerned that a number of our tests test with random data and have quite strict recall thresholds. These tests seem to fail on a regular basis.

On the one hand, this may be a concern that we have broken recall but on the other hand I believe these are just in the category of "it's possible to get bad results sometime with the wrong random data".

In other words, if our CI thresholds are set at P=0.05 then that is too strict because if we have a dozen tests and they fail 5% of the time we will have too many CI failures.

I have proposed some very relaxed suggestions here but feel free to propose alternative suggestions. I'd like these tests to be designed so that if the test fails it is a sign that we broke something and we need to fix it, not just something we ignore.

We could potentially add recall benchmarks to our benchmarking suites if we are concerned about the slow degradation of recall or our recall performance. These should probably be designed with real (not random data) and could hopefully be made more reliable in that way.

@github-actions github-actions bot added the ci Github Action or Test issues label Dec 20, 2024
@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 79.07%. Comparing base (022135b) to head (659dea1).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3282      +/-   ##
==========================================
+ Coverage   79.00%   79.07%   +0.07%     
==========================================
  Files         246      246              
  Lines       86900    87471     +571     
  Branches    86900    87471     +571     
==========================================
+ Hits        68655    69168     +513     
- Misses      15377    15439      +62     
+ Partials     2868     2864       -4     
Flag Coverage Δ
unittests 79.07% <ø> (+0.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

#[case(4, DistanceType::Dot, 0.8)]
#[case(4, DistanceType::L2, 0.7)]
#[case(4, DistanceType::Cosine, 0.7)]
#[case(4, DistanceType::Dot, 0.7)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's low the recall requirement for only 4bit IVF_PQ:

  • L2 -> 0.85
  • Cosine -> 0.85
  • Dot -> 0.75

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've scoped the change to just 4bit IVF_PQ

@westonpace westonpace force-pushed the ci/relax-recall-thresholds branch from 659dea1 to 0a551ce Compare December 31, 2024 13:23
@westonpace
Copy link
Contributor Author

westonpace commented Jan 2, 2025

@BubbleCal this case failed (recall is 0.89 and target is 0.9). Is this a bug or should we relax this threshold too?

index::vector::ivf::v2::tests::test_create_ivf_hnsw_pq::case_3

@BubbleCal
Copy link
Contributor

@BubbleCal this case failed (recall is 0.89 and target is 0.9). Is this a bug or should we relax this threshold too?

index::vector::ivf::v2::tests::test_create_ivf_hnsw_pq::case_3

let's relax dot to 0.85

@westonpace westonpace force-pushed the ci/relax-recall-thresholds branch from 0a551ce to 74fff6c Compare January 3, 2025 13:24
@westonpace
Copy link
Contributor Author

Looks like you beat me to it in 39f12dc#diff-6de816b72e7c722316243c57df4f809ad34dc8581367c72335154dada48c40ed

Thanks!

@westonpace westonpace closed this Jan 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci Github Action or Test issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants