-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[UPDATE]: update to oneapi toolkit 2024 and torch version 2.1.0 #239
Conversation
@quintinwang5 did it work locally? |
Yes. I have oneapi 2024.0 locally. It works well. |
@pbchekin Seems oneapi 2023 is still used. And there are something wrong in oneapi environment variables. Should I change someting on the CI machine?
|
We temporary have 2 CIs working in parallel. You can try to change oneapi in one of them by replacing |
If that works, you need to wait for #241, which disables the old workflow. |
Or I can change the Triton DSE Pre Commit runner to use oneAPI 2024 manually, when it is proven to work on the |
@whitneywhtsang Can I know the
These UTs can pass locally with oneapi 2024. So I think it's more likely a runtime environment problem. |
|
On the
|
This is how we install level_zero for the runner: How did you install it on a local machine? |
It's weird, if we ignore this error code(by not calling PyErr_SetString). The program can still execute properly. |
@@ -131,7 +131,7 @@ def format_of(ty): | |||
char err[1024] = {{0}}; | |||
strcat(err, prefix); | |||
strcat(err, str.c_str()); | |||
PyErr_SetString(PyExc_RuntimeError, err); | |||
//PyErr_SetString(PyExc_RuntimeError, err); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pbchekin This is just for testing. Will remove later. Can you help to check whether if we have the specific intel-level-zero-gpu
version. Seems it does not work in the Dockerfile. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pbchekin let's switch to Agama 775.20 release which should work with oneAPI 2024.0.1/2:
https://dgpu-docs.intel.com/releases/stable_775_20_20231219.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agama 775.20 has been installed: kernel driver on hosts, level zero to the runners that labeled with oneapi-2024.0.1
currently selected only for this PR. We will keep two sets of runners (oneapi 2023.2.0, oneapi 2024.0.1) while this PR is not merged.
Level zero has been updated to the latest stable rolling release in the runners:
|
Level zero cannot be updated to |
@pbchekin These errors are something like:
This should be a bug in |
Sorry, I don't understand what to check. There are no deb packages intel-igc-* installed on the runner. OneAPI 2024.0.1 is installed in ~/intel/oneapi by offline installer, we do not control versions that are bundled with the installer. |
Can we install another verison by the docker file? I tried this, but seems it did not work. |
This Docker file is used to build a runner image, it is not used during CI directly. Instead of modifying Docker file you can try installing packages in the CI workflow with |
Thanks. Having a try. |
c3f4eb2
to
97c42da
Compare
@etiotto Can I know the reason why we lower these ops(sin, cos, exp, log) here. since they can be lowered here.
|
IT was probably done for consistency with what the GPU dialect already does for NVVM and ROCDL. The GENX dialect is a counterpart to the NVVM/ROCDL dialects. The GPU dialect have corresponding conversions here for NVVM and here for ROCDL. Having said that, is questionable that the GPU dialect lowers operations in another dialect (the math dialect in this instance). But that is already the case so I think is OK for us at this point to follow suit. The latest versions of IGC support those OpenCL functions. I think we need to focus on the latest IGC version. @pengtu what is your opinion? |
I noticeed the counterpart of NVVM/ROCDL . But I want to know the rule of choosing these operators. Seems we have a big gap against them. |
This reverts commit f5914ca.
9cb8087
to
f6dbe0c
Compare
@pbchekin Confirmed the |
@etiotto Create a JIRA here. We may need to change symbols to khronos format once they switch to khr translator totally. |
So we are using the correct symbol. |
OK thanks for filing a report against IGC. |
GitHub runner image based on Ubuntu 22.04 with oneAPI 2024.0.1 and downgraded libigc1. The downgrade is required for #239.
GitHub runner image based on Ubuntu 22.04 with oneAPI 2024.0.1 and downgraded libigc1. The downgrade is required for #239.
Update to oneapi toolkit 2024 and update to torch 2.1.0.
They should be updated at the same time because ipex 1.13 package has dynamic link to libraries in oneapi 2023.