-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for GlobusComputeExecutor #3607
base: master
Are you sure you want to change the base?
Conversation
Aside from a lack of CI test, my main concern (from seeing people implement this before) is what goes on with different Parsl versions: there's one version in the endpoint (pinned to a specific version by Globus Compute) and another version on the submit side, and having different Parsl versions like that is out of scope for Parsl. This might be a case of documenting what users should expect to work or not work, or might be something deeper. At the very least we should be expecting them to be able to use the same combination of versions as used in CI. |
8273cc6
to
3ed482e
Compare
3ed482e
to
495d009
Compare
b004cf2
to
a3cad96
Compare
3b26095
to
dbc14cf
Compare
# Description This PR adds a new `staging_required` marker and a marker to several tests that assume a shared filesystem to work. Similarly, a few tests that use staging are now marked with the `staging_required` marker. This PR splits out changes from #3607. # Changed Behaviour These markers should not affect any test-behavior since none of the test entries in the Makefile make use of this. These change will kick-in once they are used by `GlobusComputeExecutor` and potentially `KubernetesProvider` for tests. ## Type of change Choose which options apply, and delete the ones which do not apply. - New feature - Code maintenance/cleanup
7d7e895
to
4f54526
Compare
d01a346
to
d9b7edc
Compare
.github/workflows/gce_test.yaml
Outdated
make clean_coverage | ||
# Temporary fix, until changes make it into compute releases | ||
git clone -b configure_tasks_working_dir https://github.com/globus/globus-compute.git |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe its good to wait for Globus Compute to support the necessary features in a release, so that this can pin a relevant supported release?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(does this mean that right now a user shouldn't expect to be able to use an actual Globus Compute but instead needs to install this fork?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The github action uses ThreadPoolEngine
because it is lightweight compared to running the default config that uses GlobusComputeEngine
. The branch that I'm pointing to fixes a ThreadPoolEngine
only bug. You can run these same tests against a local GlobusCompute endpoint, by setting GLOBUS_COMPUTE_ENDPOINT=<ep_id>
and I've done those tests against a few configs. I believe Chris (NOAA) has also done a bunch of testing at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have been testing the GlobusComputeExecutor
via this branch in combination with globus-compute-endpoint==2.30.1
. I am using that release, not a special branch/tag. The one feature I haven't yet tested is passing in the user_endpoint_config
at call time rather than at GlobusComputeExecutor
instantiation time. What I've done so far is working great!
parsl/executors/globus_compute.py
Outdated
|
||
self._executor.resource_specification = res_spec | ||
self._executor.user_endpoint_config = user_endpoint_config | ||
return self._executor.submit(func, *args, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels a bit horribly thread unsafe, were Parsl ever to get multithreaded submission - the sort of thing DFK.submitter_lock was put in place to deal with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potentially yes. I don't see how this is only a GCE specific problem. Secondly, I think it's safe to wait until users report this as an issue, or ask for it before working on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Err, I don't think it is a GCE specific problem. But I think the GCE class needs to handle it all the same. What's wrong with wrapping this work in the lock?
Or is the point that this method is called from within the lock already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When this is used from the DFK, this is already in a lock (although as you're both well aware, there is strong community pressure to use pieces of Parsl outside of the DFK). That lock was introduced in PR #625 because executors are not expected to be thread safe on submission, and so in that context this code is not dangerous.
This is a more general backpressure against using this style of API that seems to conflate the state of the submission system as a whole with parameters to a single task execution - I've definitely fixed concurrency bugs in Parsl because of this coding style before, that lead not to Parsl errors but to subtle misexecutions that mostly look plausible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
24d1b07
to
53f97fd
Compare
@benclifford @khk-globus Thanks for the review and all the comments on this issue. I believe I've addressed all of them and at this point, we are only waiting for a release of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't finished reading through the PR, but these are my first thoughts. I'll chime in on Monday morning (-0400).
parsl/executors/globus_compute.py
Outdated
|
||
self._executor.resource_specification = res_spec | ||
self._executor.user_endpoint_config = user_endpoint_config | ||
return self._executor.submit(func, *args, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Err, I don't think it is a GCE specific problem. But I think the GCE class needs to handle it all the same. What's wrong with wrapping this work in the lock?
Or is the point that this method is called from within the lock already?
.github/workflows/gce_test.yaml
Outdated
name: runinfo-3.11-${{ steps.job-info.outputs.as-ascii }}-${{ github.sha }} | ||
path: | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same context as earlier: it would be good to not hard code the python version but to collect it from the environment.
docs/userguide/execution.rst
Outdated
5. `parsl.executors.globus_compute.GlobusComputeExecutor`: This executor uses `Globus Compute <https://globus-compute.readthedocs.io/en/latest/index.html>`_ | ||
as the execution backend to run tasks on remote systems. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please consider adding more context here. I'm not calling for complete documentation, but this currently only answers the "what." How about a "why" or in what cases the GCE would be a good fit?
4639d2b
to
9b552fa
Compare
setup.py
Outdated
'globus_compute': ['globus_compute_sdk>=2.27.1'], | ||
# Disabling psi-j since github direct links are not allowed by pypi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's been a minute since we started this PR; might increase this to the most up-to-date as of ~now (2.34.0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
21fdcd1
to
41e8d4e
Compare
…te_endpoints * A new `GlobusComputeExecutor` implementation * Docs, tests, and examples * Github Action for GlobusComputeExecutor (#3619)
464c202
to
f083ad3
Compare
@khk-globus I've added some configuration examples and squashed all the previous changes. I've opted to keep the |
Description
This PR adds a new
GlobusComputeExecutor
that wraps the Globus Compute SDK to allow Parsl to execute tasks via Globus Compute. This mechanism supports remote execution of tasks similar to the functionality thatparsl.channels
enabled and is a potential replacement.Since
GlobusCompute
often runs on remote machines that do not have a shared-filesystem with the parsl runtime, tests have been updated with a newshared_fs
andstaging_required
pytest markers. I have not added tests and CI actions to enable executing these tests against our CI system, but you can run tests locally with these steps:globus-compute-sdk
withpip install .[globus-compute]
globus-compute-endpoint start <endpoint_name>
export GLOBUS_COMPUTE_ENDPOINT=<endpoint_id>
pytest -v -k "not shared_fs" --config parsl/tests/configs/globus_compute.py parsl/tests/
Changed Behaviour
N/A
Fixes
Fixes # (issue)
Type of change
Choose which options apply, and delete the ones which do not apply.