You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I used to have a nice & running AF2 installation on my computer (ubuntu 24.04, Nvidia 4090, driver nvidia-open 565.57.01).
Unfortunately, after install AF3 on the same computer (worked after some hiccups) and after trying to set docker to rootless mode, AF2 stopped to work, and so did AF3. After extensive web searches, I tried several 'fixes' but they made things worse. Even after reinstalling everything (docker, cuda-toolbox, AF2) following my original procedure (and avoiding rootless mode), I did not manage to get it to run. Error messages vary, but the most persistent is: CUDA backend failed to initialize: jaxlib/cuda/versions_helpers.cc:98: operation cuInit(0) failed: CUDA_ERROR_NO_DEVICE (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
I read that this might be related to having CUDA stuff in the LD_LIBRARY_PATH and/or to having set (or not properly set) CUDA_VISIBLE_DEVICES. However, changing those did not help
I had a similar error message in my first (successful) installation of AF2, where it could be fixed by using an older version of docker (5:26.1.4-1 rather than the newer 5:27.4.0-1). However, this time it did not help to revert to the old docker.
As far as I can see, the GPU is present and functional. I know very little about docker and don't know for sure how I can test if the GPU is accessible to the docker image I generated. The AF2 documentation recommends to try docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
but this never worked for me - not even when AF2 was running great.
I am really somewhat desperate - could it be a hardware problem? Any hint would be greatly appreciated.
The text was updated successfully, but these errors were encountered:
That line never worked for me either; docker couldn't find the image.
Modifying the line slightly to this worked since nvidia/cuda:11.0.3-base is on dockerhub whereas nvidia/cuda:11.0-base is not. docker run --rm --gpus all nvidia/cuda:11.0.3-base nvidia-smi
I used to have a nice & running AF2 installation on my computer (ubuntu 24.04, Nvidia 4090, driver nvidia-open 565.57.01).
Unfortunately, after install AF3 on the same computer (worked after some hiccups) and after trying to set docker to rootless mode, AF2 stopped to work, and so did AF3. After extensive web searches, I tried several 'fixes' but they made things worse. Even after reinstalling everything (docker, cuda-toolbox, AF2) following my original procedure (and avoiding rootless mode), I did not manage to get it to run. Error messages vary, but the most persistent is:
CUDA backend failed to initialize: jaxlib/cuda/versions_helpers.cc:98: operation cuInit(0) failed: CUDA_ERROR_NO_DEVICE (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
I read that this might be related to having CUDA stuff in the LD_LIBRARY_PATH and/or to having set (or not properly set) CUDA_VISIBLE_DEVICES. However, changing those did not help
I had a similar error message in my first (successful) installation of AF2, where it could be fixed by using an older version of docker (5:26.1.4-1 rather than the newer 5:27.4.0-1). However, this time it did not help to revert to the old docker.
As far as I can see, the GPU is present and functional. I know very little about docker and don't know for sure how I can test if the GPU is accessible to the docker image I generated. The AF2 documentation recommends to try
docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
but this never worked for me - not even when AF2 was running great.
I am really somewhat desperate - could it be a hardware problem? Any hint would be greatly appreciated.
The text was updated successfully, but these errors were encountered: