-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-44488: handle new pipe_base exception types in executors #303
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #303 +/- ##
==========================================
+ Coverage 88.95% 89.00% +0.04%
==========================================
Files 50 50
Lines 4393 4439 +46
Branches 728 733 +5
==========================================
+ Hits 3908 3951 +43
- Misses 345 347 +2
- Partials 140 141 +1 ☔ View full report in Codecov by Sentry. |
2584189
to
c46bfe6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what executor AP uses, but if they use this, they'll have to reconfigure the new kwarg to be True
, since they want the other behavior.
Check your DO NOT MERGE commit.
"Incorrect use of AnnotatedPartialOutputsError: no chained exception found.", | ||
task_node.label, | ||
quantum.dataId, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I like that. I think it should maybe be an error
, though, not a warning
, so it's more visible? That seems like a serious coding problem that should be loud.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really like ERROR or higher log messages that don't correspond to an exception or some other actual execution stoppage. I thought about making this a hard failure, but I figured we'd be much more frustrated if it brought down processing that was otherwise working as intended.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are ERROR, FAILURE, EXCEPTION log types; I wish we'd used FAILURE for what you're describing, so that we can use ERROR for other things ("be louder than warning, but I (maybe) didn't stop running").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that exception
is not a distinct log level; it's just error
with automatic stack trace printing.
else: | ||
error = caught.__cause__ | ||
if self.raise_on_partial_outputs: | ||
# This is a real edge case that required some experimentation: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd put a NOTE:
or something at the start of this, as it's sort of a meta-comment.
"Task '%s' on quantum %s exited with partial outputs; " | ||
"considering this a qualified success and proceeding.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good way to phrase it. I do wonder if it shouldn't be error
here, or exception
maybe (we're allowed to log an error
even if the task doesn't halt execution, right?), so it doesn't get lost in our massive field of warnings, but I guess that's up to DRP to decide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if we're allowed to log error
if we don't halt, but I don't like doing it because I think there's no common mental model of what it means.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, we explicitly can! Per the only place I know of that we've written any such recommendation:
https://developer.lsst.io/stack/logging.html#log-levels
ERROR: for errors that may still allow the execution to continue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll switch to ERROR here.
task_node.label, | ||
quantum.dataId, | ||
) | ||
_LOG.warning(error, exc_info=error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or make this one log.exception
? Not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(see other threads)
Logging it right before it's reraised leads to duplication in logs at higher levels.
29a9c77
to
508d294
Compare
The Prompt Processing framework uses
Yup, this is needed to keep the rest of the GitHub Actions green, so our workflow is to drop it only after merging the branch for the upstream package. |
508d294
to
ed11321
Compare
If you're making a behavioral and/or breaking change to the API, will this be announced/documented anywhere? ATM I have no basis for a decision either way. |
It's establishing behavior that I think was basically ill-defined before; RFC-958 defined two options but did not state which would be the default, IIRC, and this ticket should really be considered a belated part of that RFC's implementation. As to whether Prompt Processing actually wants to raise when I'll make a community post before I merge this in any case; there is some chance we'll want to do a bit of work on some downstream tasks as we get used to the full implications of this RFC. |
Maybe better to leave this for discussion on Community, but I do wonder whether the default for this should be to fail on partial output errors, with DRP changing it in their pipelines? I don't know what we should train the users to expect, when running pipelines themselves: "reproducible errors halt" or "reproducible errors continue and may or may not cause problems further down the line"? |
The vast majority of reproducible errors still halt, regardless of the new option - it's just the ones that also have a potentially-useful partial output that now default to proceeding. I think that's consistent with us wanting to process as far as we can in a lot of contexts (at least Rapid Analysis as well as DRP, and I think it's plausible alert production in special programs could go this direction, too, even if it doesn't in WFD). This does put the burden on the downstream tasks to handle those partial outputs, and I do suspect some of them are not ready for that responsibility yet (in particular, we'll get some confusing error messages from them at first). But I also think that's the only place that responsibility can realistically be. |
This reverts commit 7b1fce0. This logging was duplicative when seen from STDERR, which also gets exception tracebacks that propagate up, but we need error messages and tracebacks to appear in the saved logs, too, and this is the only place we can do that.
742c47e
to
2b03758
Compare
Checklist
doc/changes