-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize SplineEvaluator #499
Comments
Do you have a profiling of a spline interpolation on GPU ? |
I would be interested to see a time trace like provided by nsys or from Kokkos simple kernel timer. About the figures, I feel skeptical. I think they are of the order of magnitude of the latency of launching a GPU kernel, https://developer.nvidia.com/blog/understanding-the-visualization-of-overhead-and-latency-in-nsight-systems/#nsight_systems_overhead. Did you mean millisecond ? |
Yeaah, that's milliseconds sorry |
Anyway, in fact there is another problem with this table, it is for ny=10000, not 100000. And I realize the patch is propose is actually slower with ny=100000. So I close the issue. |
At
ddc/include/ddc/kernels/splines/spline_evaluator.hpp
Line 163 in 9f9292d
With:
Makes the performance reduce from
1.59us
to0.56us
(for a benchmark nx=1000, ny=100000).The optimal solution may require hierarchical parallelism in ddc (#396) though and maybe transposition of
spline_coef
to make spline_coef[j] contiguous.@tpadioleau should I address this ?
The text was updated successfully, but these errors were encountered: