-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Init work on publishing final output of "simple" workflow to volume #50
Draft
trey-stafford
wants to merge
22
commits into
main
Choose a base branch
from
publish-outputs-to-volume
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+268
−38
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Next steps:
|
Will be used for control flow. We should avoid overwriting existing, published data unless an overwrite flag is given (TODO).
trey-stafford
force-pushed
the
publish-outputs-to-volume
branch
from
January 16, 2025 00:23
25fba91
to
2e9f997
Compare
as of this morning im able to run this! |
Will be used for control flow. We should avoid overwriting existing, published data unless an overwrite flag is given (TODO).
trey-stafford
force-pushed
the
publish-outputs-to-volume
branch
from
January 16, 2025 18:36
d2bdf86
to
6d1ad27
Compare
trey-stafford
changed the base branch from
main
to
test-image-config-from-env
January 16, 2025 18:36
…/ogdc-runner into publish-outputs-to-volume
work on overwrite option |
Prep for "workflow of workflows" approach
Was thinking that we would construct potentially many argo workflows and then orchestrate them with a parent argo workflow, but this doesn't work so well in practice. Some features, like artifacts, do not work within child workflows.
Anticipate the need for more specific exception handling
Some of the errors around traversing nodes and checking outputs is a bit confusing. I think the way we have the workflow setup means that the relevant attrs will be present. May want to consider more robust error checking (or maybe wrap all of it in try/except...) down the road.
We expect OGDC workflows to have access to the workflow pvc so that data outputs can be written
Makes it a little easier to understand the logic
Will revisit this. The `submit_ogdc_recipe` function may end up submitting more than one workflow that we want to preserve (or cleanup) in the future, which would mean that we can't just return the name of a single workflow. Maybe we end up having a result object that contains references to all workflows executed for a given recipe.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
resolves #48
This PR adds a final step to OGDC "simple" recipes: it publishes the final data to a subpath of the
qgnet-ogdc-workflow-pvc
PVC based on the recipe's ID.The
--overwrite
flag can be used to overwrite a recipe's data if it has already been published. E.g.,Otherwise an
OgdcDataAlreadyPublished
exception is raised.This is a step toward chaining multiple workflows together. In fact, this PR introduces two new Argo workflows: one that deletes existing published data if
--overwrite
is passed, and a second that checks for the existence of already-published data (if it exists and--overwrite
is not passed, the above noted exception is raised). These argo workflows are submitted and execute independently of the argo workflow that runs the requested data transformation recipe.Note: this PR introduces a "publication" mechanism that is fairly simple in implementation. It just puts data into persistent storage. It does not trigger any other processes that might put that data into e.g., a DataONE dataset or expose it to a user for download. I anticipate these will be "next steps", soon to come. For now, the approach of saving data to a known location on the OGDC workflows PVC sets us up for chaining additional workflows (#45) that take the previous workflow's publication location as input.
In other words, this PR currently sets us up to do:
Next, we will want to support something like: