Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Render a root-cause exception for dependency and join errors (#3717)
# Description This PR reworks two exception types, DependencyError and JoinError. Both of these exceptions report that a task failed because some other task/future failed - in the dependency case, because a task dependency failed, and in the join case because one of the tasks/futures being joined failed. This PR introduces a common superclass `PropagatedException` to acknowledge that the meaning and behaviour of these two exceptions is very similar. `PropagatedException` has a new implementation for reporting the failures that are being propagated. Parsl has tried a couple of ways to do this in the past: * The implementation immediately before this PR reports only the immediate task IDs (or future reprs, for non-tasks) in the exception message. For details of the chain of exceptions and original/non-propagated exception, the user can examine the exception object via the `dependent_exceptions_tids` attribute. * Prior to PR #1802, the repr/str (and so the printed form) of dependency exceptions rendered the entire exception. In the case of deep dependency chains or where a dependency graph has many paths to a root cause, this resulted in extremely voluminous output with a lot of boiler plate dependency exception text. The approach introduced by this current PR attempts a fusion of these two approaches: * The user will often be waiting only on the final task of a dependency chain (because the DFK will be managing everything in between) - so they will often get a dependency exception. * When they get a dependency exception, they are likely to actually be interested in the root cause at the earliest part of the chain. So this PR makes dependency exceptions traverse the chain and discover a root cause * When there are multiple root causes, or multiple paths to the same root cause, the user should not be overwhelmed with output. So this PR picks a single root cause exception to report fully, and when there are other causes/paths adds a small annotation `(+ others)` * The user is sometimes interested in the path from that root cause exception to the current failure, but often not. That path is rendered roughly the same as immediately before this PR as a sequence of task IDs (or Future reprs for non-tasks) * Python has a native mechanism for indicating that an exception is caused by another exception, the `__cause__` magic attribute which is usually populated by `raise e1 from e2`. This PR populates that magic attribute at construction so that displaying the exception will show the cause using Python's native format. * The user may want to ask other Parsl-relevant questions about the exception chain, so this PR keeps the `dependent_exceptions_tids` attribute for such introspection. A dependency or join error is now rendered by Python as exactly two exceptions next to each other: ``` Traceback (most recent call last): File "/home/benc/parsl/src/parsl/parsl/dataflow/dflow.py", line 922, in _unwrap_futures new_args.extend([self.dependency_resolver.traverse_to_unwrap(dep)]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/functools.py", line 907, in wrapper return dispatch(args[0].__class__)(*args, **kw) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/functools.py", line 907, in wrapper return dispatch(args[0].__class__)(*args, **kw) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/benc/parsl/src/parsl/parsl/dataflow/dependency_resolvers.py", line 48, in _ return fut.result() ^^^^^^^^^^^^ File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 449, in result return self.__get_result() ^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result raise self._exception File "/home/benc/parsl/src/parsl/parsl/dataflow/dflow.py", line 339, in handle_exec_update res = self._unwrap_remote_exception_wrapper(future) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/benc/parsl/src/parsl/parsl/dataflow/dflow.py", line 603, in _unwrap_remote_exception_wrapper result.reraise() File "/home/benc/parsl/src/parsl/parsl/app/errors.py", line 114, in reraise raise v File "/home/benc/parsl/src/parsl/parsl/app/errors.py", line 138, in wrapper return func(*args, **kwargs) ^^^^^^^^^^^^^^^ File "/home/benc/parsl/src/parsl/taskchain.py", line 13, in failer raise RuntimeError("example root failure") ^^^^^^^^^^^^^^^^^ RuntimeError: example root failure The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/home/benc/parsl/src/parsl/taskchain.py", line 16, in <module> inter(inter(inter(inter(inter(failer()))))).result() File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 456, in result return self.__get_result() ^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result raise self._exception File "/home/benc/parsl/src/parsl/parsl/dataflow/dflow.py", line 339, in handle_exec_update res = self._unwrap_remote_exception_wrapper(future) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/benc/parsl/src/parsl/parsl/dataflow/dflow.py", line 601, in _unwrap_remote_exception_wrapper result = future.result() ^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 449, in result return self.__get_result() ^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result raise self._exception parsl.dataflow.errors.DependencyError: Dependency failure for task 5. The representative cause is via task 4 <- task 3 <- task 2 <- task 1 <- task 0 ``` # Changed Behaviour DependencyErrors and JoinErrors will render differently ## Type of change - Update to human readable text: Documentation/error messages/comments
- Loading branch information