Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SYCL kernel compilation: establish a guideline or avoid #1938

Open
dmitriy-sobolev opened this issue Nov 18, 2024 · 0 comments
Open

SYCL kernel compilation: establish a guideline or avoid #1938

dmitriy-sobolev opened this issue Nov 18, 2024 · 0 comments

Comments

@dmitriy-sobolev
Copy link
Contributor

Summary:

SYCL kernel compilation allows kernel introspection to select a work-group size according to available resources (e.g. shared local memory), but it may negatively impact performance. It is not clear when to use it, whether it is generally avoidable or not.

Problem Statement:
It is not clear when to compile the kernels, whether it is generally avoidable or not.

It appears that there is an empirical rule for GPU devices: use no more than a half of SLM. For example, scan, reduce, find, merge-sort and radix-sort rely on this finding. Below is an example for the reduce pattern:
https://github.com/oneapi-src/oneDPL/blob/4898de274ed46526a1dae3e32fbdc525dd2e0291/include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_reduce.h#L460-L464

The ultimate question: can/should it applied to other devices?

Preferred Solution:

Clarify the strategy of using the compiled kernels, or do not use them at all.

Additional Context:

There is an internal knob to control kernel compilation (_ONEDPL_COMPILE_KERNEL ), but its uses are not well-defined due to missing reasoning.

That question was also raised here: #1881 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant