You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sometimes the DEEPCELL_MESMER module seems to randomly fail on the HPC related to loading tensorflow within the singularity container.
Command used and terminal output
nextflow run nf-core/molkart -r 6c1eef828896a5e60fefc9aa2398ad76ab41ec63 -profile singularity -c ./core_molkart_MI.conf -params-file ./params.yml -with-tower -resumeCommand error: 2023-12-18 15:27:38.508036: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.8/dist-packages/cv2/../../lib64:/.singularity.d/libs 2023-12-18 15:27:38.508066: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 2023-12-18 15:27:42.757496: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.8/dist-packages/cv2/../../lib64:/.singularity.d/libs 2023-12-18 15:27:42.757524: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303) 2023-12-18 15:27:42.757542: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (m03n06): /proc/driver/nvidia/version does not exist 2023-12-18 15:27:42.758824: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. Traceback (most recent call last): File "/usr/src/app/run_app.py", line 60, in <module> run_application(dict(ARGS._get_kwargs())) File "/usr/src/app/deepcell_applications/app_runners.py", line 52, in run_application app = dca.utils.get_app(arg_dict['app']) File "/usr/src/app/deepcell_applications/utils.py", line 44, in get_app return app_map[name]['class'](**kwargs) File "/usr/local/lib/python3.8/dist-packages/deepcell/applications/mesmer.py", line 222, in __init__ model = tf.keras.models.load_model(model_path) File "/usr/local/lib/python3.8/dist-packages/keras/utils/traceback_utils.py", line 67, in error_handler raise e.with_traceback(filtered_tb) from None File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/saved_model/load.py", line 991, in load_internal raise ValueError("SavedModels saved from Tensorflow 1.x or Estimator (any" ValueError: SavedModels saved from Tensorflow 1.x or Estimator (any version) cannot be loaded with node filters.Work dir: /gpfs/bwfor/work/ws/hd_gr294-MIproject_nfcore_molkart/data/Molecular_Cartography/work/4f/9358489573beb1dcf90f8accbc901bTip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run` -- Check '.nextflow.log' file for detailsWARN: Tower request field `workflow.errorMessage` exceeds expected size | offending value: `2023-12-18 15:27:38.508036: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.8/dist-packages/cv2/../../lib64:/.singularity.d/libs2023-12-18 15:27:38.508066: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.2023-12-18 15:27:42.757496: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.8/dist-packages/cv2/../../lib64:/.singularity.d/libs2023-12-18 15:27:42.757524: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)2023-12-18 15:27:42.757542: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (m03n06): /proc/driver/nvidia/version does not exist2023-12-18 15:27:42.758824: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMATo enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.Traceback (most recent call last): File "/usr/src/app/run_app.py", line 60, in <module> run_application(dict(ARGS._get_kwargs())) File "/usr/src/app/deepcell_applications/app_runners.py", line 52, in run_application app = dca.utils.get_app(arg_dict['app']) File "/usr/src/app/deepcell_applications/utils.py", line 44, in get_app return app_map[name]['class'](**kwargs) File "/usr/local/lib/python3.8/dist-packages/deepcell/applications/mesmer.py", line 222, in __init__ model = tf.keras.models.load_model(model_path) File "/usr/local/lib/python3.8/dist-packages/keras/utils/traceback_utils.py", line 67, in error_handler raise e.with_traceback(filtered_tb) from None File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/saved_model/load.py", line 991, in load_internal raise ValueError("SavedModels saved from Tensorflow 1.x or Estimator (any"ValueError: SavedModels saved from Tensorflow 1.x or Estimator (any version) cannot be loaded with node filters.`, size: 2462 (max: 255)
Description of the bug
Sometimes the DEEPCELL_MESMER module seems to randomly fail on the HPC related to loading tensorflow within the singularity container.
Command used and terminal output
Relevant files
No response
System information
HPC : https://wiki.bwhpc.de/e/Helix
Executor: Slurm
Container engine: singularity
nextflow version 23.10.0.5889
The text was updated successfully, but these errors were encountered: