-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Graceful error handling #1
Comments
On the one hand, my personal error handling scheme is as follows: Case 1. When error occurs at the controller… When an error occurs at the controller, it should immediately terminate the entire task. If the task is interactive, the controller should attempt to notify the client about the error before doing so. Case 2. When error occurs at the compute node… Node should notify the controller about the error, and the controller should terminate the entire task at once. If the task is interactive, both the controller and the node should try to inform the client about the error prior to termination. Case 3. When error occurs at the client… No action is required; it should simply die. If it is an interactive task, an error will be occurred at the compute node and/or the controller. The first one that calls But on the other hand, we can consider the mechanism from the perspective of the CAP theorem. When an error occurs:
I think Eventual Consistency is a different feature which is nice if implemented, but we won't discuss about it in this issue. The problem is that, our current scheme does not ensure Instant Consistency, either: consider when the compute node cannot connect to the controller, and an error occurs at the compute node. Since the client does not pass anything to the controller (Case 3), the controller will never know about the true error. |
I've thought about this question these days and found that our core problem is the mixed error pattern:
This pattern presents some technical challenges:
This thread is worth reading, as discussed about how asynchronous errors should be handled. I thought it would be better to break this issue into several parts to work on. |
Currently, we have not considered error handling, although we have established an error struct
UError
. Generally, we believe that:As for how to handle database errors in the controller (e.g., the database is no longer accessible or cannot be written to), it is currently undetermined.
Personally, I believe we should continue to operate or consider it an error merely for the current task, but immediately terminating the entire controller also seems reasonable.
See #1 (comment) for details on these parts.
EventBus
into a stateless event hook #6The text was updated successfully, but these errors were encountered: