-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA version of settler_cover #746
Conversation
Removing
|
I wonder at what point it becomes no longer worth the overhead of transferring data to the GPU. For example, a small domain of < 100 locations or if conditions are such that there are low number of locations to be evaluated (e.g., low numbers of |
Ideally, all of |
Working example of CUDA-fied `settler_cover()`
Yep, definitely too much data transfer happening, but now we have a first pass working example. There are ways around this: |
In the last few commits, I went back to using this original code. @ConnectedSystems impl (now 'settler_cover_cuda2`) is a bit faster, but I don't understand how it's equivalent. potential_settlers[:, valid_sinks] .= (
fec_scope[:, valid_sources] * conn[valid_sources, valid_sinks]
) I tried |
Also, can't we cache |
We can't cache, as it may change from time step to time step, but we can preallocate to reuse the same vector. That's what I was referring to in the comments. |
# Calculate settler cover and copy result back to host | ||
# This matrix multiplication is the most time-consuming part | ||
# (`recruitment_rate()` takes < 1ms) | ||
# FIXME reversed? dest param is first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bah, I think you're right, I accidentally switched it
Closing this for now. I don't think it's worth spending much time on CUDA until more GPU compute becomes available. At the moment, the strategy is that model runs should be done on the AIMS HPC, but we only have one GPU node. |
I thought I commented but it seemed to have disappeared. The team has access to a dedicated remote desktop with a GPU. Someone else is using it hence why I asked you to prototype on the HPC. |
You commented on #747 I did work on both the HPC and remote desktop. I usually close PRs if they're not merging soon and re-open them later when work resumes, but if your prefer to keep it open we can. At the moment, I'm focused on other tasks. |
Create a CUDA (Nvidia CPU lib) version of
settler_cover
#739
TODO
Draft Status
The CUDA implementation is in a temporary sandbox project directory
test_performance
. We need to determine how to organize the CUDA code prior to merging this PR. There's a question of whether CUDA should be used automatically if available, or a config flag for ADRIA.CUDA ext
Probably a Julia ext like VizExt.
settler_growth_cuda
will be moved there.If Julia allows dynamic imports, I'm thinking something like:
where
AdriaCuda
would be an ext.CUDA in base pkg
Alternatively, we just
add CUDA
to base and follow this approachhttps://cuda.juliagpu.org/stable/installation/conditional/#Scenario-2:-GPU-is-optional
Blocking CUDA Error
see https://cuda.juliagpu.org/stable/usage/workflow/#UsageWorkflowScalar
https://cuda.juliagpu.org/stable/usage/array/#Array-wrappers
Notes
pkg
add CUDA
takes a long time when it detects a Nvidia GPU. It downloads multiple runtime files. You can specifylocal=true
to prevent this and use the system CUDA toolkit.https://cuda.juliagpu.org/stable/installation/overview/#Using-a-local-CUDA
Also see https://cuda.juliagpu.org/stable/installation/overview/#Using-a-local-CUDA
On HPC g001, I use a different Julia depot path to get around this. But
CUDA.set_runtime_version!(v"11.8"; local_toolkit=true)
could possibly work as well?