Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restructure executor #38

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Restructure executor #38

wants to merge 2 commits into from

Conversation

liamhuber
Copy link
Member

Here I try to accomplish two things:

First, to better separate the actual minimum requirements and interface from the description of our preferred tool. In this way, executorlib.Executor is only one implementation of what pyiron_workflow requires. To this end I split appart "downstream" sections (base and workflow and what they require) and an "upstream" section (executorlib and what it offers).

Second, there is a mismatch between the executorlib flow management and pyiron_workflow flow management, which was embedded in this statement:

  • For both pyiron_base and pyiron_workflow it would be convenient if users could send the entire graph to the HPC for execution, without relying on a connection to an external process monitoring the graph as it is currently implemented in pyiron_workflow. This can be achieved if a series of smaller nodes with similar resource requirements can be converted into one larger task which is then executed by executorlib as a single job in the queuing system. So a node in pyiron_workflow does not have a one to one relation to an task in executorlib. In executorlib tasks are primarily defined by having similar resoruce requirements, only when a process has changing resource requirements, then it makes sense to separate them into individul tasks in executorlib. In the same way executorlib can be used to efficiently parallelize the execution.

Which I streamlined down to a single "TODO" around task grouping for executorlib.

One issue here is that on the pyiron_workflow side, the nodes are all functional but the graph as a whole has state (inputs and outputs), and after the node functionality is executed, a callback function is used to map the result of that functionality back onto the graph state (i.e. to populate the output channels with data). In the context of firing off the entire graph to something like SLURM all at once, I don't see an easy technical solution for holding and updating the graph state. I also don't see and easy technical solution for avoiding the graph IO updates -- pyiron_workflow is exactly a workflow manager and expects to be able to manage the workflow.

The second issue is that, from a conversation with @jan-janssen, my understanding is that executorlib handles the dependency trees in the following way: DAG only, each downstream can depend on multiple upstream, but each dependency must be absorbed in its entirety -- i.e. multiple-input-single-output. That is not an incompatible world to pyiron_workflow, but represents only a subset of the cyclic-and multiple-input-multiple-output graphs representable by pyiron_workflow, so we would at most be able to use this functionality on a subset of graphs.

I think this sort of dependency tree is a great feature for executorlib, and is absolutely useful for the "high performance expert" user case. I just want to break it apart from the universal spec. For the pyiron_workflow use case, it would be much more productive to focus on the abilities to (a) run the entire graph remotely or (b) run individual nodes remotely (per @Tara-Lakshmipathy's PR). These have the advantage that they're both already possible, so it's just a matter of building nicer users interfaces to the existing capability, e.g. in part by combining Executor and FileExecutor as already suggested here.

Here I try to accomplish two things:

First, to better separate the actual minimum requirements and interface from the description of our preferred tool. In this way, `executorlib.Executor` is only one implementation of what `pyiron_workflow` requires. To this end I split appart "downstream" sections (base and workflow and what they require) and an "upstream" section (executorlib and what it offers).

Second, there is a mismatch between the `executorlib` flow management and `pyiron_workflow` flow management, which was embedded in this statement:
> * For both `pyiron_base` and `pyiron_workflow` it would be convenient if users could send the entire graph to the HPC for execution, without relying on a connection to an external process monitoring the graph as it is currently implemented in `pyiron_workflow`. This can be achieved if a series of smaller nodes with similar resource requirements can be converted into one larger task which is then executed by `executorlib` as a single job in the queuing system. So a node in `pyiron_workflow` does not have a one to one relation to an task in `executorlib`. In `executorlib` tasks are primarily defined by having similar resoruce requirements, only when a process has changing resource requirements, then it makes sense to separate them into individul tasks in `executorlib`. In the same way `executorlib` can be used to efficiently parallelize the execution.  

Which I streamlined down to a single "TODO" around task grouping for executorlib.

One issue here is that on the `pyiron_workflow` side, the nodes are all functional but the graph as a whole has state (inputs and outputs), and _after_ the node functionality is executed, a callback function is used to map the result of that functionality back onto the graph state (i.e. to populate the output channels with data). In the context of firing off the entire graph to something like SLURM all at once, I don't see an easy technical solution for holding and updating the graph state. I also don't see and easy technical solution for avoiding the graph IO updates -- `pyiron_workflow` is exactly a workflow manager and expects to be able to manage the workflow.

The second issue is that, from [a conversation with @jan-janssen](https://github.com/pyiron/specs/pull/31/files#r1789103867), my understanding is that executorlib handles the dependency trees in the following way: DAG only, each downstream can depend on multiple upstream, but each dependency must be absorbed in its entirety -- i.e. multiple-input-single-output. That is not an incompatible world to `pyiron_workflow`, but represents only a subset of the cyclic-and multiple-input-multiple-output graphs representable by `pyiron_workflow`, so we would at most be able to use this functionality on a subset of graphs.

I think this sort of dependency tree is a great feature for `executorlib`, and is absolutely useful for the "high performance expert" user case. I just want to break it apart from the universal spec. For  the `pyiron_workflow` use case, it would be much more productive to focus on the abilities to (a) run the entire graph remotely or (b) run individual nodes remotely (per @Tara-Lakshmipathy's [PR](#28)). These have the advantage that they're both already possible, so it's just a matter of building nicer users interfaces to the existing capability, e.g. in part by combining `Executor` and `FileExecutor` as already suggested here.
@liamhuber liamhuber requested a review from jan-janssen October 7, 2024 16:49
executor.md Outdated Show resolved Hide resolved
@jan-janssen
Copy link
Member

I am not so happy with the refactoring, as it gives the impression that pyiron_base and pyiron_workflow have competing interests, while the main focus of developing these SPECs was to unify the interface of both packages.

@liamhuber
Copy link
Member Author

I am not so happy with the refactoring, as it gives the impression that pyiron_base and pyiron_workflow have competing interests, while the main focus of developing these SPECs was to unify the interface of both packages.

It is not clear to me why you come to this conclusion. To define the spec we need to clearly state the general interface (done -- concurrent.futures.Executor), the specific requirements (done for pyiron_workflow -- submit, future, and a flag and TODO around convenience for executors that survive the shutdown of the parent process; non-existent for pyiron_base), and tools we have to solve the problem (done for executorlib; we should probably add a section discussing pysqa, but I'm the wrong person to write it).

pyiron_base has no spec for executors, indeed has no requirements for executors. Until it does there can literally be no competition. The shared interest I see here is that both base and workflow should use executorlib and pysqa, and more generally should conform to the concurrent.futures.Executor pattern to interface with other tools. Beyond that, they may have slightly different details in their requirements (e.g. workflow doesn't use the map part of the interface). If we want base and workflow leaning on the same tools, we need to clearly state what their requirements are and what steps are necessary to get them interfacing with our preferred toolset.

@jan-janssen
Copy link
Member

To define the spec we need to clearly state the general interface (done -- concurrent.futures.Executor),

While this is correct on the technical level, our users want to use their workflows in the HPC context, which is not covered by concurrent.futures.Executor, so we have to extend the interface.

the specific requirements (done for pyiron_workflow -- submit, future, and a flag and TODO around convenience for executors that survive the shutdown of the parent process; non-existent for pyiron_base),

I agree that we should do a consistency check, to validate that both pyiron_workflow and pyiron_base can work with the executor class we define, still the specifics below to the specs for pyiron_workflow and pyiron_base not the specs for the executor.

and tools we have to solve the problem (done for executorlib; we should probably add a section discussing pysqa, but I'm the wrong person to write it).

Do we plan to have a pysqa interface in addition to the executorlib interface in pyiron_workflow and pyiron_base? From my perspective, pysqa is a component of executorlib which the users select as a backend pyiron/executorlib#422 but pyiron_workflow and pyiron_base should not interface with pysqa directly. Currently pyiron_base still uses pysqa directly but that is because pyiron_base is not using executorlib yet.

pyiron_base has no spec for executors, indeed has no requirements for executors. Until it does there can literally be no competition.

I spend the weekend to introduce decorators in pyiron_base, so the interface to executorlib is on the list. While I agree it is not written down yet, executorlib was designed with pyiron_base in mind, so both are going to be compatible.

The shared interest I see here is that both base and workflow should use executorlib and pysqa, and more generally should conform to the concurrent.futures.Executor pattern to interface with other tools. Beyond that, they may have slightly different details in their requirements (e.g. workflow doesn't use the map part of the interface). If we want base and workflow leaning on the same tools, we need to clearly state what their requirements are and what steps are necessary to get them interfacing with our preferred toolset.

I agree, I think the primary question is: Can we agree on a joined interface? Based on a discussion I had with Joerg back at IPAM in 2013, it is important to distinguish workflow nodes and tasks for the executor. Not every small python function - e.g. the summation of two numbers - is worth submitting to an executor, rather multiple nodes with similar resource requirements - e.g. serial python function calls - should be bundled to a single task and then be submitted to the executor as a single task. The mapping of nodes to tasks has to be handled by the workflow manager, as it requires knowledge of the task dependencies, while the executor only provides a simple interface to execute tasks. For the context of pyiron_base a task equals a job.

@liamhuber
Copy link
Member Author

Sorry for the delay in replying, I missed the alert on this thread.

I agree that we should do a consistency check, to validate that both pyiron_workflow and pyiron_base can work with the executor class we define, still the specifics below to the specs for pyiron_workflow and pyiron_base not the specs for the executor.

I don't understand the nature of this objection. We don't have separate spec files for base and workflow. I thought one of the key objectives in this venture was to see what tools both platforms could jointly exploit? In this case we need to write down somewhere what the expectations and needs of those platforms are. Doing it in the file for the tool makes the most sense for me, but otherwise this similar content strictly needs to be copied into platform spec files, not just ignored.

...From my perspective, pysqa is a component of executorlib which the users select as a backend pyiron/executorlib#422 but pyiron_workflow and pyiron_base should not interface with pysqa directly...

Yeah, I totally agree 100%. The only place pysqa comes up in the PR itself is to say that base is currently interacting with it directly. IMO the "executor" file is exactly the right place to talk about how executorlib will empower and extend the concurrent.futures.Executor concept to interact with HPC. As stated, I'm not the right person to write that spec. I think we have no disagreement here, this is just not part of this PR.

I spend the weekend to introduce decorators in pyiron_base, so the interface to executorlib is on the list. While I agree it is not written down yet, executorlib was designed with pyiron_base in mind, so both are going to be compatible.

Good, sounds great. Again, not related to this PR. This PR simply acknowledges that base doesn't leverage executorlib and leaves a space for later defining how it will. In the interest of getting this PR unblocked, I think we can move on from this point.

I agree, I think the primary question is: Can we agree on a joined interface? Based on a discussion I had with Joerg back at IPAM in 2013, it is important to distinguish workflow nodes and tasks for the executor. Not every small python function - e.g. the summation of two numbers - is worth submitting to an executor, rather multiple nodes with similar resource requirements - e.g. serial python function calls - should be bundled to a single task and then be submitted to the executor as a single task. The mapping of nodes to tasks has to be handled by the workflow manager, as it requires knowledge of the task dependencies, while the executor only provides a simple interface to execute tasks. For the context of pyiron_base a task equals a job.

I don't want to argue semantics here; from the pyiron_workflow side, we need to be able to ship a node's run functionality off to the executor, as stated in the PR specs. I am in full agreement that not every little python operation should be shipped off to the executor; I'm totally fine with the idea that if you want to "execute" some stuff you can clump it together in a macro node as a pyiron_workflow solution. I think I had misunderstood earlier statements from you about executorlib managing dependencies and then I overstepped by adding this TODO:

A series of smaller nodes with similar resource requirements should be be convertable into one larger task which is then executed by executorlib as a single job in the queuing system, i.e. task grouping.

I don't actually need this feature nor feel strongly about its (in)existence. I'm going to simply remove the line from the PR.

Please take a look again and let me know if you have objections to any specific content in the PR.

executor.md Outdated Show resolved Hide resolved
executor.md Show resolved Hide resolved

## Downstream -- `pyiron_workflow`

* An `executor` or information for creating one can be assigned to a node, and that node will use it for the core functionality of the node
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only understand the first part (i.e. "executor can be assigned to a node"). I guess the rest has to be reformulated

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to reformulate. I'm also content that it's somewhat vague here -- I'm trying only to note who else in the project uses the executor functionality, not how it is used, which we shouldn't care about over here! It is probably worth mentioning that pwf relies on submit but not map, but anyhow a good solution is implementing both so it's nbd

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants