-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python Workflow Definition #62
Comments
Based on the discussions we had as part of the core hackathon I tried to work on a way to exchange workflow graphs between
The two example notebooks for
The two example notebooks for
The format is currently very simple, based on the edges_lst = [
{'target': 0, 'targetHandle': 'x', 'source': 1, 'sourceHandle': 'x'},
{'target': 1, 'targetHandle': 'x', 'source': 2, 'sourceHandle': None},
{'target': 1, 'targetHandle': 'y', 'source': 3, 'sourceHandle': None},
{'target': 0, 'targetHandle': 'y', 'source': 1, 'sourceHandle': 'y'},
{'target': 0, 'targetHandle': 'z', 'source': 1, 'sourceHandle': 'z'},
] The nodes at the moment just use the function as Python objects and the same ids used in the nodes_dict = {
0: add_x_and_y_and_z,
1: add_x_and_y,
2: 1,
3: 2,
} With the functions being simply: def add_x_and_y(x, y):
z = x + y
return {"x": x, "y": y, "z": z}
def add_x_and_y_and_z(x, y, z):
w = x + y + z
return w In future these functions could also be defined in a module and then be just referenced by their path, so the nodes dictionary would look like this: nodes_dict = {
0: my_module.add_x_and_y_and_z,
1: my_module.add_x_and_y,
2: 1,
3: 2,
} Such a workflow could be serialized as JSON and together with the |
Alternative notation based on tuples rather than dictionaries to improve human readability: edges_lst = [
('add_x_and_y/in/x', 'var_add_x_y__x'),
('add_x_and_y/in/y', 'var_add_x_y__y'),
('add_x_and_y_and_z/in/x', 'add_x_and_y/out/x'),
('add_x_and_y_and_z/in/y', 'add_x_and_y/out/y'),
('add_x_and_y_and_z/in/z', 'add_x_and_y/out/z'),
]
nodes_lst = [
( 'var_add_x_y__x@int', 1),
( 'var_add_x_y__y@int', 2),
('add_x_and_y@callable', add_x_and_y),
('add_x_and_y_and_z@callable', add_x_and_y_and_z),
] |
Do we want to use the exchange format also for long term storage of graphs and data? |
The
I am not exactly sure what you are referring to. The workflow should not have any loose ends, either the user sets a value or we have a recipe to get the input from previous functions. I currently have not considered the case of incomplete inputs.
I currently consider the case that a workflow is based on a python module with a number of functions, a conda environment file and a JSON representation of the workflow. Ideally the python module should be minimal and most Python functions should be distributed as conda packages. So for the workflow definition I would store the path to import the module and the name of the function. I guess that covers
Input data is saved as additional nodes. For default inputs there is no need to save them separately as those are already stored in the function definition.
This is topic to discuss. The primary use case is to exchange workflows between different workflow frameworks, currently aiida, jobflow (Materialsproject) and pyiron. To achieve the interoperability of workflows and make them FAIR. https://arxiv.org/abs/2410.03490 Beyond this use case I see the option to extend the format and also use it for long term data storage of both graphs and data. I think such an interoperable storage format would allow us to use both |
I also added a workflow example for |
Based on the discussions at the core hackathon - flip charts.
Requirements:
=> Just the language - only DAG function calls
Needs:
This would also be compatible to CWL and snakemake - From DAG to CWL and snakemake
Different levels of Interoperability
Publish workflows
The text was updated successfully, but these errors were encountered: