-
Notifications
You must be signed in to change notification settings - Fork 0
glados::cuda::launch-kernel does not create good configuration #11
Comments
Proposed new algorithm:
As the launch functions are designed to be "fire and forget" functions I don't see major performance issues with this approach. If your (not you personally but in general) kernel really really needs the performance you should hand-tune it anyway. Thoughts? |
Yes, the proposed algorithm will completeley suffice. |
Thanks for the notice. My old laptop is now able to reconstruct a 1070x1070x1033 volume 20% faster :). |
Perfect! |
Could you please perform timing measurements on K20c and GTC1080 with PARIS using the new structured GLADOS? |
Sure, I'll try to squeeze it in on Monday or so. Otherwise in the new year, is that sufficient? |
See hzdr/PARIS#29 for further reference. |
The block sizes which are computed for 2- or 3-dimensional kernels are not multiples of 32.
e.g.: 432x500 2-dimensional threads shall be created which results in this configuration:
When invoking the kernel with a block size of (16,16) and (27, 32) the average kernel runtime is nearly 10 times faster.
The text was updated successfully, but these errors were encountered: