-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix minor api load issues #4
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4 +/- ##
=======================================
Coverage 47.07% 47.08%
=======================================
Files 11 11
Lines 667 669 +2
=======================================
+ Hits 314 315 +1
- Misses 353 354 +1 ☔ View full report in Codecov by Sentry. |
if not data.index.is_monotonic_increasing: | ||
warnings.warn( | ||
f"data index for {reader.pattern} contains out-of-order timestamps!", stacklevel=2 | ||
) | ||
data = data.sort_index() | ||
else: | ||
elif data.index.has_duplicates: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anayapouget Just to make sure I am understanding correctly, the goal here is to avoid throwing a warning in cases where there are no duplicates AND the data is monotonic increasing, is that right? Essentially the below truth table:
monotonic | duplicates | warning |
---|---|---|
no | no | yes |
no | yes | yes |
yes | no | no (this case) |
yes | yes | yes |
I guess if there are no duplicates in the data and the timestamps are monotonic, then this must be a real KeyError
, so we probably should just rethrow the exception and save the processing time of repeating the retrieval, i.e. have an extra case with:
else:
raise
Do you have an example where this was a problem? I guess if it is just a true KeyError
I can simply use the existing datasets to reproduce the issue in a unit test.
Otherwise if it is something else and you can point me to an affected dataset, I will add it to the test suite so we can keep track of any regressions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have an example where this was a problem? I guess if it is just a true
KeyError
I can simply use the existing datasets to reproduce the issue in a unit test.
@anayapouget So I did try a few different ways to get some kind of KeyError
which would not involve duplicates or non-monotonic timestamps and it's hard, since range loc
queries allow for out-of-range inputs (they will simply return no data, but will not raise a KeyError
) so I am really curious whether you ran into some other edge case which we are not contemplating yet.
Or maybe the change was for readability? This also makes sense by the way, since the reading of the if-statements were a bit misleading. I'm happy to keep this even just for that, and maybe just add the else: raise
at the end just to guarantee no unhandled case ever falls through.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@glopesdev The goal here was to ensure that the code explicitly checks for non-monotonic timestamps rather than for lack of duplicates. This is because with position data, there are always duplicates. Each row is corresponds to an identity + body part pair, which means you can have quite a number of rows for a same index. So when I had a case on non-monotonic timestamps and it entered the except
clause, the code ended up removing duplicates (which we absolutely want to avoid since these duplicates are very much intentional) and then throwing an error anyways because the non-monotonic issue was not addressed.
The table you made is correct, though it should be noted that monotonic position data with duplicates thankfully does not lose it's duplicates (row 4 behaviour) because it does not enter the except
clause. I don't actually know when duplicates ever cause a KeyError - it's not something I have ever seen happen.
The conclusion is that I do not feel overly strongly about the specifics of the implementation, but for the pose data the monotonic check definitely needs to happen before the duplicate check, and ideally the duplicate check should actually never happen at all on that type of data. I am supposing it is there for a reason, but if the duplicate check could be deleted that is what would make the most sense from the pose data perspective.
As for the extra else: raise
case, I agree we should add that in!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback, that is very helpful and I think I can now come up with a unit test for it.
if repeat_idxs: # drop x, y, and likelihood cols for repeat parts (skip first 5 cols) | ||
init_rep_part_col_idx = (repeat_idxs - 1) * 3 + 5 | ||
if repeat_idxs: # drop x, y, and likelihood cols for repeat parts (skip first cols) | ||
num_cols_skip = 5 if bonsai_sleap_v == BONSAI_SLEAP_V2 else 6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well spotted, I guess we could add a unit test to track these kind of regressions, but maybe not needed.
What I would do is probably split these two fixes into separate commits (and PR) so we can provide separate contexts in blame to help future code archeologists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure would you like me to close this PR and split it into 2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anayapouget yup, that would be great! In the meantime I will work on the unit tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I realize now that the Pose
reader has a baked in call to sort_index()
which means that no out-of-order timestamps can happen, but also it blinds us to any of these warnings.
To fix this I need to correct all of it together, so if you don't mind I will open a new branch, "fix" the reader, fix the SLEAP version issue, then make a unit test that can reproduce the duplication problem, then remove the unused check.
Do you know of a dataset from SLEAP_V2 which we could use to unit test the second part of the changes?
No description provided.