You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We currently have two local executors, based on either multithreading or multiprocessing, each one with its own test. We would rather maintain a single one, to simplify the code-base and reduce the CI duration.
The multiprocessing-based local_experimental executor is clearly the one we'd like to keep, because it also has the important feature of letting users stop a running job.
First complexity: V1 vs V2 (solved)
The multiprocessing-based local_experimental executor is only available for V2, meaning we would need to backport it to V1 to keep everything consistent. V1 is scheduled for full deprecation, thus we are clearly not adding a feature.
Simple workaround: just rename the executor from local to local_experimental, with no changes.
Second complexity: process-based executor is much slower
When running our benchmarks, we found that (for the use cases we are simulating!) the local_experimental executor is much slower. And it's also possible that this relates to #1772, although it was not explored further.
A benchmarks GHA goes from taking typically 5 minutes to 35/40 minutes. We tried a simple fix, by increasing max_workers, with no clear improvement.
We did not go too far into analyzing this issue, since the actual goal was to reduce the CI time rather than going deep into the threads/processes differences. But it's easy to guess that the overhead for creating/closing processes is much higher than the same overhead for threads - and perhaps our example workloads are strongly influenced by overheads (since the actual tasks are dummy).
Where to go from here
For the moment we cannot fully move from our threads-based executor to the processes-based one, until we learn more about how the two perform in a realistic scenario. Any choice in this area also needs to be based on what we expect in terms of use cases, and on which trade-offs we may accept for a local deployment.
The text was updated successfully, but these errors were encountered:
Thanks for the overview. High-level, with the container-based approach taking over more and more of the how one would run local tests & do local development, I think we can limit the complexity of use cases that the local executors cover. A core area certainly is running the CI so we don't always have to simulate the slurm interactions.
If the switch to the process executor leads to much increased CI times, it's certainly not worth fully switching to it for now.
This issue replaces the following attempts:
local
backend #2042local_experimental
intolocal
#2091local_experimental
as a default runner #2093max_workers
variable in local-executor config #2095local
backend #2092local_experimental
as a default runner #2094What needs to change
We currently have two local executors, based on either multithreading or multiprocessing, each one with its own test. We would rather maintain a single one, to simplify the code-base and reduce the CI duration.
The multiprocessing-based
local_experimental
executor is clearly the one we'd like to keep, because it also has the important feature of letting users stop a running job.First complexity: V1 vs V2 (solved)
The multiprocessing-based
local_experimental
executor is only available for V2, meaning we would need to backport it to V1 to keep everything consistent. V1 is scheduled for full deprecation, thus we are clearly not adding a feature.Simple workaround: just rename the executor from
local
tolocal_experimental
, with no changes.Second complexity: process-based executor is much slower
When running our benchmarks, we found that (for the use cases we are simulating!) the
local_experimental
executor is much slower. And it's also possible that this relates to #1772, although it was not explored further.A benchmarks GHA goes from taking typically 5 minutes to 35/40 minutes. We tried a simple fix, by increasing
max_workers
, with no clear improvement.We did not go too far into analyzing this issue, since the actual goal was to reduce the CI time rather than going deep into the threads/processes differences. But it's easy to guess that the overhead for creating/closing processes is much higher than the same overhead for threads - and perhaps our example workloads are strongly influenced by overheads (since the actual tasks are dummy).
Where to go from here
For the moment we cannot fully move from our threads-based executor to the processes-based one, until we learn more about how the two perform in a realistic scenario. Any choice in this area also needs to be based on what we expect in terms of use cases, and on which trade-offs we may accept for a local deployment.
The text was updated successfully, but these errors were encountered: