-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch from multisession parallelization to multicore in evaluate
stage
#53
Switch from multisession parallelization to multicore in evaluate
stage
#53
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Good debugging.
If I recall, the historical reasons for using multisession
over multicore
were:
- The modeling used to be done on Windows laptops, not the current linux VM, and only
multisession
works on Windows. - The linux VM was much smaller and memory-constrained when we first started using it, and
multicore
would crash the session (especially when we were still calculating CIs). multicore
didn't play well with RStudio, which mattered when we were running the model interactively back in the day.
Given that these aren't really constraints anymore, I'm fine with switching over to multicore
.
pipeline/03-evaluate.R
Outdated
# Enable parallel backend for generating stats faster. | ||
# In the past we used the 'multisession' parallelization strategy, but this | ||
# strategy exhibits diminishing returns (and eventually worse performance) past | ||
# 5 workers on the server, and it's not particularly fast either (~10 mins to | ||
# complete this stage). The 'multicore' strategy has a higher risk of hogging | ||
# server resources for the duration of execution, but it executes much faster | ||
# than the multisession strategy (~80 seconds to complete this stage), so | ||
# ultimately we think it's worth the risk; plus, we only use half the available | ||
# cores in order to ensure we don't block execution of other important tasks on | ||
# the server. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Since this is specific to our environment and not necessarily to the pipeline in general, I vote that we move this comment into the commit body.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, done in 583dd38.
Do you think this is an important enough consideration to continue to support running the script in RStudio @dfsnow? Maybe we could check |
I haven't tried running |
RStudio does indeed seem to block
Checking |
This PR updates the
evaluate
stage of the pipeline to switch from the multisession parallelization strategy to multicore. This change is intended to fix the behavior we've been seeing on the server whereby theevaluate
stage takes so long to complete that it makes development difficult.I'm not sure why this behavior appears to be different from last year, but my experiments with trying different numbers of workers using the "multisession" strategy on a minimal reprex revealed that execution begins to slow down when the number of workers increases past 5 (minimum runtime ~10 minutes). There must be some sort of overhead that the background R processes incur, but I couldn't find anything in the docs explaining it. Switching to the "multicore" strategy resolves this problem of diminishing returns, but incurs the risk of using more memory (due to forked process isolation) and more CPU resources (due to execution using logical cores rather than threads). In order to mitigate this risk, we reduce the number of workers to half the available logical cores on the machine that runs the pipeline. With 16 cores on the server, this causes the
evaluate
stage to execute fast enough (~80 seconds) that I'm not too worried about hogging resources.Note that this change also decreases the execution time for this stage on Batch from 200s to 60s.