fix minor api load issues #4

anayapouget · 2025-01-31T12:20:18Z

No description provided.

codecov-commenter · 2025-01-31T12:21:16Z

Codecov Report

Attention: Patch coverage is 60.00000% with 2 lines in your changes missing coverage. Please review.

Project coverage is 47.08%. Comparing base (f98c343) to head (a4e50d0).

Files with missing lines	Patch %	Lines
aeon/io/api.py	0.00%	2 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main       #4   +/-   ##
=======================================
  Coverage   47.07%   47.08%           
=======================================
  Files          11       11           
  Lines         667      669    +2     
=======================================
+ Hits          314      315    +1     
- Misses        353      354    +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

glopesdev · 2025-01-31T13:39:12Z

aeon/io/api.py

+            if not data.index.is_monotonic_increasing:
                warnings.warn(
                    f"data index for {reader.pattern} contains out-of-order timestamps!", stacklevel=2
                )
                data = data.sort_index()
-            else:
+            elif data.index.has_duplicates:


@anayapouget Just to make sure I am understanding correctly, the goal here is to avoid throwing a warning in cases where there are no duplicates AND the data is monotonic increasing, is that right? Essentially the below truth table:

monotonic duplicates warning

no no yes

no yes yes

yes no no (this case)

yes yes yes

I guess if there are no duplicates in the data and the timestamps are monotonic, then this must be a real KeyError, so we probably should just rethrow the exception and save the processing time of repeating the retrieval, i.e. have an extra case with:

else: raise

Do you have an example where this was a problem? I guess if it is just a true KeyError I can simply use the existing datasets to reproduce the issue in a unit test.

Otherwise if it is something else and you can point me to an affected dataset, I will add it to the test suite so we can keep track of any regressions.

Do you have an example where this was a problem? I guess if it is just a true KeyError I can simply use the existing datasets to reproduce the issue in a unit test.

@anayapouget So I did try a few different ways to get some kind of KeyError which would not involve duplicates or non-monotonic timestamps and it's hard, since range loc queries allow for out-of-range inputs (they will simply return no data, but will not raise a KeyError) so I am really curious whether you ran into some other edge case which we are not contemplating yet.

Or maybe the change was for readability? This also makes sense by the way, since the reading of the if-statements were a bit misleading. I'm happy to keep this even just for that, and maybe just add the else: raise at the end just to guarantee no unhandled case ever falls through.

@glopesdev The goal here was to ensure that the code explicitly checks for non-monotonic timestamps rather than for lack of duplicates. This is because with position data, there are always duplicates. Each row is corresponds to an identity + body part pair, which means you can have quite a number of rows for a same index. So when I had a case on non-monotonic timestamps and it entered the except clause, the code ended up removing duplicates (which we absolutely want to avoid since these duplicates are very much intentional) and then throwing an error anyways because the non-monotonic issue was not addressed.

The table you made is correct, though it should be noted that monotonic position data with duplicates thankfully does not lose it's duplicates (row 4 behaviour) because it does not enter the except clause. I don't actually know when duplicates ever cause a KeyError - it's not something I have ever seen happen.

The conclusion is that I do not feel overly strongly about the specifics of the implementation, but for the pose data the monotonic check definitely needs to happen before the duplicate check, and ideally the duplicate check should actually never happen at all on that type of data. I am supposing it is there for a reason, but if the duplicate check could be deleted that is what would make the most sense from the pose data perspective.

As for the extra else: raise case, I agree we should add that in!

Thanks for the feedback, that is very helpful and I think I can now come up with a unit test for it.

glopesdev · 2025-01-31T13:45:37Z

aeon/io/reader.py

-        if repeat_idxs:  # drop x, y, and likelihood cols for repeat parts (skip first 5 cols)
-            init_rep_part_col_idx = (repeat_idxs - 1) * 3 + 5
+        if repeat_idxs:  # drop x, y, and likelihood cols for repeat parts (skip first cols)
+            num_cols_skip = 5 if bonsai_sleap_v == BONSAI_SLEAP_V2 else 6


Well spotted, I guess we could add a unit test to track these kind of regressions, but maybe not needed.

What I would do is probably split these two fixes into separate commits (and PR) so we can provide separate contexts in blame to help future code archeologists.

Sure would you like me to close this PR and split it into 2?

@anayapouget yup, that would be great! In the meantime I will work on the unit tests.

Actually, I realize now that the Pose reader has a baked in call to sort_index() which means that no out-of-order timestamps can happen, but also it blinds us to any of these warnings.

To fix this I need to correct all of it together, so if you don't mind I will open a new branch, "fix" the reader, fix the SLEAP version issue, then make a unit test that can reproduce the duplication problem, then remove the unused check.

Do you know of a dataset from SLEAP_V2 which we could use to unit test the second part of the changes?

fix minor api load issues

a4e50d0

anayapouget requested review from glopesdev, jkbhagatio and lochhh January 31, 2025 12:20

glopesdev reviewed Jan 31, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix minor api load issues #4

fix minor api load issues #4

anayapouget commented Jan 31, 2025

codecov-commenter commented Jan 31, 2025

glopesdev Jan 31, 2025 •

edited

Loading

glopesdev Feb 4, 2025

anayapouget Feb 4, 2025

glopesdev Feb 5, 2025

glopesdev Jan 31, 2025 •

edited

Loading

anayapouget Feb 4, 2025

glopesdev Feb 5, 2025

glopesdev Feb 5, 2025

fix minor api load issues #4

Are you sure you want to change the base?

fix minor api load issues #4

Conversation

anayapouget commented Jan 31, 2025

codecov-commenter commented Jan 31, 2025

Codecov Report

glopesdev Jan 31, 2025 • edited Loading

Choose a reason for hiding this comment

glopesdev Feb 4, 2025

Choose a reason for hiding this comment

anayapouget Feb 4, 2025

Choose a reason for hiding this comment

glopesdev Feb 5, 2025

Choose a reason for hiding this comment

glopesdev Jan 31, 2025 • edited Loading

Choose a reason for hiding this comment

anayapouget Feb 4, 2025

Choose a reason for hiding this comment

glopesdev Feb 5, 2025

Choose a reason for hiding this comment

glopesdev Feb 5, 2025

Choose a reason for hiding this comment

glopesdev Jan 31, 2025 •

edited

Loading

glopesdev Jan 31, 2025 •

edited

Loading