ci: relax recall thresholds for CI #3282

westonpace · 2024-12-20T21:37:24Z

I am concerned that a number of our tests test with random data and have quite strict recall thresholds. These tests seem to fail on a regular basis.

On the one hand, this may be a concern that we have broken recall but on the other hand I believe these are just in the category of "it's possible to get bad results sometime with the wrong random data".

In other words, if our CI thresholds are set at P=0.05 then that is too strict because if we have a dozen tests and they fail 5% of the time we will have too many CI failures.

I have proposed some very relaxed suggestions here but feel free to propose alternative suggestions. I'd like these tests to be designed so that if the test fails it is a sign that we broke something and we need to fix it, not just something we ignore.

We could potentially add recall benchmarks to our benchmarking suites if we are concerned about the slow degradation of recall or our recall performance. These should probably be designed with real (not random data) and could hopefully be made more reliable in that way.

codecov-commenter · 2024-12-20T21:57:11Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 79.07%. Comparing base (022135b) to head (659dea1).
Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3282      +/-   ##
==========================================
+ Coverage   79.00%   79.07%   +0.07%     
==========================================
  Files         246      246              
  Lines       86900    87471     +571     
  Branches    86900    87471     +571     
==========================================
+ Hits        68655    69168     +513     
- Misses      15377    15439      +62     
+ Partials     2868     2864       -4

Flag	Coverage Δ
unittests	`79.07% <ø> (+0.07%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

BubbleCal · 2024-12-23T03:18:57Z

rust/lance/src/index/vector/ivf/v2.rs

-    #[case(4, DistanceType::Dot, 0.8)]
+    #[case(4, DistanceType::L2, 0.7)]
+    #[case(4, DistanceType::Cosine, 0.7)]
+    #[case(4, DistanceType::Dot, 0.7)]


let's low the recall requirement for only 4bit IVF_PQ:

L2 -> 0.85

Cosine -> 0.85

Dot -> 0.75

Ok, I've scoped the change to just 4bit IVF_PQ

westonpace · 2025-01-02T18:29:15Z

@BubbleCal this case failed (recall is 0.89 and target is 0.9). Is this a bug or should we relax this threshold too?

index::vector::ivf::v2::tests::test_create_ivf_hnsw_pq::case_3

BubbleCal · 2025-01-03T05:22:35Z

@BubbleCal this case failed (recall is 0.89 and target is 0.9). Is this a bug or should we relax this threshold too?
index::vector::ivf::v2::tests::test_create_ivf_hnsw_pq::case_3

let's relax dot to 0.85

westonpace · 2025-01-03T13:27:00Z

Looks like you beat me to it in 39f12dc#diff-6de816b72e7c722316243c57df4f809ad34dc8581367c72335154dada48c40ed

Thanks!

github-actions bot added the ci Github Action or Test issues label Dec 20, 2024

BubbleCal reviewed Dec 23, 2024

View reviewed changes

westonpace force-pushed the ci/relax-recall-thresholds branch from 659dea1 to 0a551ce Compare December 31, 2024 13:23

westonpace mentioned this pull request Jan 3, 2025

fix: allow empty scalar indices and don't drop nulls on update #3329

Merged

BubbleCal approved these changes Jan 3, 2025

View reviewed changes

westonpace added 3 commits January 3, 2025 05:23

Relax recall thresholds for CI

a82b620

Restrict PR to 4bit test

e1d88b8

Relax dot thresholds

74fff6c

westonpace force-pushed the ci/relax-recall-thresholds branch from 0a551ce to 74fff6c Compare January 3, 2025 13:24

westonpace closed this Jan 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: relax recall thresholds for CI #3282

ci: relax recall thresholds for CI #3282

westonpace commented Dec 20, 2024

codecov-commenter commented Dec 20, 2024

BubbleCal Dec 23, 2024

westonpace Dec 31, 2024

westonpace commented Jan 2, 2025 •

edited

Loading

BubbleCal commented Jan 3, 2025

westonpace commented Jan 3, 2025

ci: relax recall thresholds for CI #3282

ci: relax recall thresholds for CI #3282

Conversation

westonpace commented Dec 20, 2024

codecov-commenter commented Dec 20, 2024

Codecov Report

BubbleCal Dec 23, 2024

Choose a reason for hiding this comment

westonpace Dec 31, 2024

Choose a reason for hiding this comment

westonpace commented Jan 2, 2025 • edited Loading

BubbleCal commented Jan 3, 2025

westonpace commented Jan 3, 2025

westonpace commented Jan 2, 2025 •

edited

Loading