Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix minor api load issues #4

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

fix minor api load issues #4

wants to merge 1 commit into from

Conversation

anayapouget
Copy link

No description provided.

@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 60.00000% with 2 lines in your changes missing coverage. Please review.

Project coverage is 47.08%. Comparing base (f98c343) to head (a4e50d0).

Files with missing lines Patch % Lines
aeon/io/api.py 0.00% 2 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main       #4   +/-   ##
=======================================
  Coverage   47.07%   47.08%           
=======================================
  Files          11       11           
  Lines         667      669    +2     
=======================================
+ Hits          314      315    +1     
- Misses        353      354    +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Comment on lines +149 to +154
if not data.index.is_monotonic_increasing:
warnings.warn(
f"data index for {reader.pattern} contains out-of-order timestamps!", stacklevel=2
)
data = data.sort_index()
else:
elif data.index.has_duplicates:
Copy link
Contributor

@glopesdev glopesdev Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anayapouget Just to make sure I am understanding correctly, the goal here is to avoid throwing a warning in cases where there are no duplicates AND the data is monotonic increasing, is that right? Essentially the below truth table:

monotonic duplicates warning
no no yes
no yes yes
yes no no (this case)
yes yes yes

I guess if there are no duplicates in the data and the timestamps are monotonic, then this must be a real KeyError, so we probably should just rethrow the exception and save the processing time of repeating the retrieval, i.e. have an extra case with:

            else:
                raise

Do you have an example where this was a problem? I guess if it is just a true KeyError I can simply use the existing datasets to reproduce the issue in a unit test.

Otherwise if it is something else and you can point me to an affected dataset, I will add it to the test suite so we can keep track of any regressions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have an example where this was a problem? I guess if it is just a true KeyError I can simply use the existing datasets to reproduce the issue in a unit test.

@anayapouget So I did try a few different ways to get some kind of KeyError which would not involve duplicates or non-monotonic timestamps and it's hard, since range loc queries allow for out-of-range inputs (they will simply return no data, but will not raise a KeyError) so I am really curious whether you ran into some other edge case which we are not contemplating yet.

Or maybe the change was for readability? This also makes sense by the way, since the reading of the if-statements were a bit misleading. I'm happy to keep this even just for that, and maybe just add the else: raise at the end just to guarantee no unhandled case ever falls through.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@glopesdev The goal here was to ensure that the code explicitly checks for non-monotonic timestamps rather than for lack of duplicates. This is because with position data, there are always duplicates. Each row is corresponds to an identity + body part pair, which means you can have quite a number of rows for a same index. So when I had a case on non-monotonic timestamps and it entered the except clause, the code ended up removing duplicates (which we absolutely want to avoid since these duplicates are very much intentional) and then throwing an error anyways because the non-monotonic issue was not addressed.

The table you made is correct, though it should be noted that monotonic position data with duplicates thankfully does not lose it's duplicates (row 4 behaviour) because it does not enter the except clause. I don't actually know when duplicates ever cause a KeyError - it's not something I have ever seen happen.

The conclusion is that I do not feel overly strongly about the specifics of the implementation, but for the pose data the monotonic check definitely needs to happen before the duplicate check, and ideally the duplicate check should actually never happen at all on that type of data. I am supposing it is there for a reason, but if the duplicate check could be deleted that is what would make the most sense from the pose data perspective.

As for the extra else: raise case, I agree we should add that in!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback, that is very helpful and I think I can now come up with a unit test for it.

if repeat_idxs: # drop x, y, and likelihood cols for repeat parts (skip first 5 cols)
init_rep_part_col_idx = (repeat_idxs - 1) * 3 + 5
if repeat_idxs: # drop x, y, and likelihood cols for repeat parts (skip first cols)
num_cols_skip = 5 if bonsai_sleap_v == BONSAI_SLEAP_V2 else 6
Copy link
Contributor

@glopesdev glopesdev Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well spotted, I guess we could add a unit test to track these kind of regressions, but maybe not needed.

What I would do is probably split these two fixes into separate commits (and PR) so we can provide separate contexts in blame to help future code archeologists.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure would you like me to close this PR and split it into 2?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anayapouget yup, that would be great! In the meantime I will work on the unit tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I realize now that the Pose reader has a baked in call to sort_index() which means that no out-of-order timestamps can happen, but also it blinds us to any of these warnings.

To fix this I need to correct all of it together, so if you don't mind I will open a new branch, "fix" the reader, fix the SLEAP version issue, then make a unit test that can reproduce the duplication problem, then remove the unused check.

Do you know of a dataset from SLEAP_V2 which we could use to unit test the second part of the changes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants