Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error converting symbolic tensor object to numpy while running RELERNN_TRAIN #60

Open
brooklynnscott00 opened this issue Dec 19, 2024 · 14 comments

Comments

@brooklynnscott00
Copy link

brooklynnscott00 commented Dec 19, 2024

While running RELERNN_TRAIN, I ran into the following error which appears to be the result of a failure to convert a tensor object to a numpy array. I ran into this after a fresh install of a conda environment following the same versions of dependencies specified in the documentation (tensorflow/2.2.0, cudatoolkit/10.1.243, and cudnn/7.6.5). Any help would be greatly appreciated!

2024-12-19 09:15:57.795018: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2024-12-19 09:15:57.871059: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 2944210000 Hz
2024-12-19 09:15:57.871375: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x555559e7dcf0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2024-12-19 09:15:57.871560: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2024-12-19 09:15:57.884792: I tensorflow/core/common_runtime/process_util.cc:147] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
2024-12-19 09:16:28.739187: I tensorflow/core/common_runtime/process_util.cc:147] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
Traceback (most recent call last):
  File "/home/brscott4/.conda/envs/relernn/bin/ReLERNN_TRAIN", line 130, in <module>
    main()
  File "/home/brscott4/.conda/envs/relernn/bin/ReLERNN_TRAIN", line 109, in main
    runModels(ModelFuncPointer=GRU_TUNED84,
  File "/home/brscott4/.conda/envs/relernn/lib/python3.8/site-packages/ReLERNN/helpers.py", line 344, in runModels
    model = ModelFuncPointer(x,y)
  File "/home/brscott4/.conda/envs/relernn/lib/python3.8/site-packages/ReLERNN/networks.py", line 19, in GRU_TUNED84
    model = layers.Bidirectional(layers.GRU(84,return_sequences=False))(genotype_inputs)
  File "/home/brscott4/.conda/envs/relernn/lib/python3.8/site-packages/tensorflow/python/keras/layers/wrappers.py", line 531, in __call__
    return super(Bidirectional, self).__call__(inputs, **kwargs)
  File "/home/brscott4/.conda/envs/relernn/lib/python3.8/site-packages/tensorflow/python/keras/engine/base_layer.py", line 922, in __call__
    outputs = call_fn(cast_inputs, *args, **kwargs)
  File "/home/brscott4/.conda/envs/relernn/lib/python3.8/site-packages/tensorflow/python/keras/layers/wrappers.py", line 644, in call
    y = self.forward_layer(forward_inputs,
  File "/home/brscott4/.conda/envs/relernn/lib/python3.8/site-packages/tensorflow/python/keras/layers/recurrent.py", line 654, in __call__
    return super(RNN, self).__call__(inputs, **kwargs)
  File "/home/brscott4/.conda/envs/relernn/lib/python3.8/site-packages/tensorflow/python/keras/engine/base_layer.py", line 922, in __call__
    outputs = call_fn(cast_inputs, *args, **kwargs)
  File "/home/brscott4/.conda/envs/relernn/lib/python3.8/site-packages/tensorflow/python/keras/layers/recurrent_v2.py", line 408, in call
    inputs, initial_state, _ = self._process_inputs(inputs, initial_state, None)
  File "/home/brscott4/.conda/envs/relernn/lib/python3.8/site-packages/tensorflow/python/keras/layers/recurrent.py", line 848, in _process_inputs
    initial_state = self.get_initial_state(inputs)
  File "/home/brscott4/.conda/envs/relernn/lib/python3.8/site-packages/tensorflow/python/keras/layers/recurrent.py", line 636, in get_initial_state
    init_state = get_initial_state_fn(
  File "/home/brscott4/.conda/envs/relernn/lib/python3.8/site-packages/tensorflow/python/keras/layers/recurrent.py", line 1910, in get_initial_state
    return _generate_zero_filled_state_for_cell(self, inputs, batch_size, dtype)
  File "/home/brscott4/.conda/envs/relernn/lib/python3.8/site-packages/tensorflow/python/keras/layers/recurrent.py", line 2926, in _generate_zero_filled_state_for_cell
    return _generate_zero_filled_state(batch_size, cell.state_size, dtype)
  File "/home/brscott4/.conda/envs/relernn/lib/python3.8/site-packages/tensorflow/python/keras/layers/recurrent.py", line 2944, in _generate_zero_filled_state
    return create_zeros(state_size)
  File "/home/brscott4/.conda/envs/relernn/lib/python3.8/site-packages/tensorflow/python/keras/layers/recurrent.py", line 2939, in create_zeros
    return array_ops.zeros(init_state_size, dtype=dtype)
  File "/home/brscott4/.conda/envs/relernn/lib/python3.8/site-packages/tensorflow/python/ops/array_ops.py", line 2677, in wrapped
    tensor = fun(*args, **kwargs)
  File "/home/brscott4/.conda/envs/relernn/lib/python3.8/site-packages/tensorflow/python/ops/array_ops.py", line 2721, in zeros
    output = _constant_if_small(zero, shape, dtype, name)
  File "/home/brscott4/.conda/envs/relernn/lib/python3.8/site-packages/tensorflow/python/ops/array_ops.py", line 2662, in _constant_if_small
    if np.prod(shape) < 1000:
  File "<__array_function__ internals>", line 180, in prod
  File "/home/brscott4/.conda/envs/relernn/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 3045, in prod
    return _wrapreduction(a, np.multiply, 'prod', axis, dtype, out,
  File "/home/brscott4/.conda/envs/relernn/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 86, in _wrapreduction
    return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
  File "/home/brscott4/.conda/envs/relernn/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 748, in __array__
    raise NotImplementedError("Cannot convert a symbolic Tensor ({}) to a numpy"
NotImplementedError: Cannot convert a symbolic Tensor (bidirectional/forward_gru/strided_slice:0) to a numpy array.
@andrewkern
Copy link
Member

hi there-- to help debug this can you give me a full list of the versions in your python environment? assuming you use conda you can get this with conda list

also are you getting this error trying to run our example input?

@brooklynnscott00
Copy link
Author

Hello, thanks for getting back to me. Yes I am getting the same error when I try to run the sample input.

Here are my conda environment details:

(relernn) [brscott4@sc003:/scratch/brscott4/downloads]$ conda list
# packages in environment at /home/brscott4/.conda/envs/relernn:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
_tflow_select             2.3.0                       mkl
absl-py                   2.1.0              pyhd8ed1ab_0    conda-forge
alsa-lib                  1.2.13               hb9d3cd8_0    conda-forge
apricot-select            0.6.1              pyhd8ed1ab_0    conda-forge
asciitree                 0.3.3                      py_2    conda-forge
astor                     0.8.1              pyh9f0ad1d_0    conda-forge
astunparse                1.6.3              pyhd8ed1ab_2    conda-forge
attr                      2.5.1                h166bdaf_1    conda-forge
attrs                     24.2.0             pyh71513ae_0    conda-forge
aws-c-auth                0.7.22               h96bc93b_2    conda-forge
aws-c-cal                 0.6.14               h88a6e22_1    conda-forge
aws-c-common              0.9.19               h4ab18f5_0    conda-forge
aws-c-compression         0.2.18               h83b837d_6    conda-forge
aws-c-event-stream        0.4.2               ha47c788_12    conda-forge
aws-c-http                0.8.1               h29d6fba_17    conda-forge
aws-c-io                  0.14.8               h21d4f22_5    conda-forge
aws-c-mqtt                0.10.4               h759edc4_4    conda-forge
aws-c-s3                  0.5.9                h594631b_3    conda-forge
aws-c-sdkutils            0.1.16               h83b837d_2    conda-forge
aws-checksums             0.1.18               h83b837d_6    conda-forge
aws-crt-cpp               0.26.9               he3a8b3b_0    conda-forge
aws-sdk-cpp               1.11.329             hba8bd5f_3    conda-forge
blas                      1.1                    openblas    conda-forge
bokeh                     3.1.1              pyhd8ed1ab_0    conda-forge
brotli                    1.1.0                hd590300_1    conda-forge
brotli-bin                1.1.0                hd590300_1    conda-forge
brotli-python             1.1.0            py38h17151c0_1    conda-forge
bzip2                     1.0.8                h4bc722e_7    conda-forge
c-ares                    1.34.4               hb9d3cd8_0    conda-forge
ca-certificates           2024.12.14           hbcca054_0    conda-forge
cairo                     1.18.0               h3faef2a_0    conda-forge
certifi                   2024.8.30          pyhd8ed1ab_0    conda-forge
cffi                      1.17.0           py38heb5c249_0    conda-forge
charset-normalizer        3.4.0              pyhd8ed1ab_0    conda-forge
click                     8.1.7           unix_pyh707e725_0    conda-forge
cloudpickle               3.1.0              pyhd8ed1ab_1    conda-forge
colorama                  0.4.6              pyhd8ed1ab_0    conda-forge
contourpy                 1.1.1            py38h7f3f72f_1    conda-forge
cudatoolkit               10.1.243            h6d9799a_13    conda-forge
cudnn                     7.6.5.32             hc0a50b0_1    conda-forge
cycler                    0.12.1             pyhd8ed1ab_0    conda-forge
cytoolz                   0.12.3           py38h01eb140_0    conda-forge
dask                      2023.5.0           pyhd8ed1ab_0    conda-forge
dask-core                 2023.5.0           pyhd8ed1ab_0    conda-forge
dbus                      1.13.6               h5008d03_3    conda-forge
demes                     0.2.3              pyhd8ed1ab_0    conda-forge
distributed               2023.5.0           pyhd8ed1ab_0    conda-forge
expat                     2.6.4                h5888daf_0    conda-forge
fasteners                 0.17.3             pyhd8ed1ab_0    conda-forge
filelock                  3.16.1             pyhd8ed1ab_0    conda-forge
font-ttf-dejavu-sans-mono 2.37                 hab24e00_0    conda-forge
font-ttf-inconsolata      3.000                h77eed37_0    conda-forge
font-ttf-source-code-pro  2.038                h77eed37_0    conda-forge
font-ttf-ubuntu           0.83                 h77eed37_3    conda-forge
fontconfig                2.14.2               h14ed4e7_0    conda-forge
fonts-conda-ecosystem     1                             0    conda-forge
fonts-conda-forge         1                             0    conda-forge
fonttools                 4.53.1           py38h2019614_0    conda-forge
freetype                  2.12.1               h267a509_2    conda-forge
fsspec                    2024.10.0          pyhff2d567_0    conda-forge
gast                      0.3.3                      py_0    conda-forge
gettext                   0.22.5               he02047a_3    conda-forge
gettext-tools             0.22.5               he02047a_3    conda-forge
gflags                    2.2.2             h5888daf_1005    conda-forge
glib                      2.80.2               hf974151_0    conda-forge
glib-tools                2.80.2               hb6ce0ca_0    conda-forge
glog                      0.7.1                hbabe93e_0    conda-forge
gmp                       6.3.0                hac33072_2    conda-forge
gmpy2                     2.1.5            py38h6a1700d_1    conda-forge
google-pasta              0.2.0              pyhd8ed1ab_1    conda-forge
graphite2                 1.3.13            h59595ed_1003    conda-forge
grpcio                    1.62.2           py38h94a1851_0    conda-forge
gsl                       2.7                  he838d99_0    conda-forge
gst-plugins-base          1.24.4               h9ad1361_0    conda-forge
gstreamer                 1.24.4               haf2f30d_0    conda-forge
h2                        4.1.0              pyhd8ed1ab_0    conda-forge
h5py                      2.10.0          nompi_py38h9915d05_106    conda-forge
harfbuzz                  8.5.0                hfac3d4d_0    conda-forge
hdf5                      1.10.6               h3ffc7dd_1
hpack                     4.0.0              pyh9f0ad1d_0    conda-forge
hyperframe                6.0.1              pyhd8ed1ab_0    conda-forge
icu                       73.2                 h59595ed_0    conda-forge
idna                      3.10               pyhd8ed1ab_0    conda-forge
importlib-metadata        8.5.0              pyha770c72_0    conda-forge
importlib-resources       6.4.5              pyhd8ed1ab_0    conda-forge
importlib_metadata        8.5.0                hd8ed1ab_1    conda-forge
importlib_resources       6.4.5              pyhd8ed1ab_0    conda-forge
jinja2                    3.1.4              pyhd8ed1ab_0    conda-forge
joblib                    1.4.2              pyhd8ed1ab_0    conda-forge
jsonschema                4.23.0             pyhd8ed1ab_0    conda-forge
jsonschema-specifications 2024.10.1          pyhd8ed1ab_0    conda-forge
keras-preprocessing       1.1.2              pyhd8ed1ab_0    conda-forge
keyutils                  1.6.1                h166bdaf_0    conda-forge
kiwisolver                1.4.5            py38h7f3f72f_1    conda-forge
krb5                      1.21.3               h659f571_0    conda-forge
lame                      3.100             h166bdaf_1003    conda-forge
lcms2                     2.16                 hb7c19ff_0    conda-forge
ld_impl_linux-64          2.43                 h712a8e2_2    conda-forge
lerc                      4.0.0                h27087fc_0    conda-forge
libabseil                 20240116.2      cxx17_he02047a_1    conda-forge
libarrow                  16.1.0           hcb6531f_6_cpu    conda-forge
libarrow-acero            16.1.0           hac33072_6_cpu    conda-forge
libarrow-dataset          16.1.0           hac33072_6_cpu    conda-forge
libarrow-substrait        16.1.0           h7e0c224_6_cpu    conda-forge
libasprintf               0.22.5               he8f35ee_3    conda-forge
libasprintf-devel         0.22.5               he8f35ee_3    conda-forge
libblas                   3.9.0           26_linux64_openblas    conda-forge
libbrotlicommon           1.1.0                hd590300_1    conda-forge
libbrotlidec              1.1.0                hd590300_1    conda-forge
libbrotlienc              1.1.0                hd590300_1    conda-forge
libcap                    2.71                 h39aace5_0    conda-forge
libcblas                  3.9.0           26_linux64_openblas    conda-forge
libclang-cpp15            15.0.7          default_h127d8a8_5    conda-forge
libclang13                18.1.7          default_h087397f_0    conda-forge
libcrc32c                 1.1.2                h9c3ff4c_0    conda-forge
libcups                   2.3.3                h4637d8d_4    conda-forge
libcurl                   8.8.0                hca28451_1    conda-forge
libdeflate                1.20                 hd590300_0    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 hd590300_2    conda-forge
libevent                  2.1.12               hf998b51_1    conda-forge
libexpat                  2.6.4                h5888daf_0    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libflac                   1.4.3                h59595ed_0    conda-forge
libgcc                    14.2.0               h77fa898_1    conda-forge
libgcc-ng                 14.2.0               h69a702a_1    conda-forge
libgcrypt-lib             1.11.0               hb9d3cd8_2    conda-forge
libgettextpo              0.22.5               he02047a_3    conda-forge
libgettextpo-devel        0.22.5               he02047a_3    conda-forge
libgfortran               14.2.0               h69a702a_1    conda-forge
libgfortran-ng            14.2.0               h69a702a_1    conda-forge
libgfortran5              14.2.0               hd5240d6_1    conda-forge
libglib                   2.80.2               hf974151_0    conda-forge
libgomp                   14.2.0               h77fa898_1    conda-forge
libgoogle-cloud           2.24.0               h2736e30_0    conda-forge
libgoogle-cloud-storage   2.24.0               h3d9a0c8_0    conda-forge
libgpg-error              1.51                 hbd13f7d_1    conda-forge
libgrpc                   1.62.2               h15f2491_0    conda-forge
libiconv                  1.17                 hd590300_2    conda-forge
libjpeg-turbo             3.0.0                hd590300_1    conda-forge
liblapack                 3.9.0           26_linux64_openblas    conda-forge
libllvm10                 10.0.1               he513fc3_3    conda-forge
libllvm15                 15.0.7               hb3ce162_4    conda-forge
libllvm18                 18.1.7               hb77312f_0    conda-forge
liblzma                   5.6.3                hb9d3cd8_1    conda-forge
liblzma-devel             5.6.3                hb9d3cd8_1    conda-forge
libnghttp2                1.58.0               h47da74e_1    conda-forge
libnsl                    2.0.1                hd590300_0    conda-forge
libogg                    1.3.5                h4ab18f5_0    conda-forge
libopenblas               0.3.28          pthreads_h94d23a6_1    conda-forge
libopus                   1.3.1                h7f98852_1    conda-forge
libparquet                16.1.0           h6a7eafb_6_cpu    conda-forge
libpng                    1.6.43               h2797004_0    conda-forge
libpq                     16.6                 h035377e_1    conda-forge
libprotobuf               4.25.3               h08a7969_0    conda-forge
libre2-11                 2023.09.01           h5a48ba9_2    conda-forge
libsndfile                1.2.2                hc60ed4a_1    conda-forge
libsqlite                 3.46.0               hde9e2c9_0    conda-forge
libssh2                   1.11.0               h0841786_0    conda-forge
libstdcxx                 14.2.0               hc0a3c3a_1    conda-forge
libstdcxx-ng              14.2.0               h4852527_1    conda-forge
libsystemd0               256.9                h2774228_0    conda-forge
libthrift                 0.19.0               hb90f79a_1    conda-forge
libtiff                   4.6.0                h1dd3fc0_3    conda-forge
libtorch                  2.4.0           cpu_generic_h4a3044c_1    conda-forge
libutf8proc               2.8.0                hf23e847_1    conda-forge
libuuid                   2.38.1               h0b41bf4_0    conda-forge
libuv                     1.49.2               hb9d3cd8_0    conda-forge
libvorbis                 1.3.7                h9c3ff4c_0    conda-forge
libwebp-base              1.4.0                hd590300_0    conda-forge
libxcb                    1.15                 h0b41bf4_0    conda-forge
libxcrypt                 4.4.36               hd590300_1    conda-forge
libxkbcommon              1.7.0                h662e7e4_0    conda-forge
libxml2                   2.12.7               hc051c1a_1    conda-forge
libzlib                   1.2.13               h4ab18f5_6    conda-forge
llvmlite                  0.36.0           py38h4630a5e_0    conda-forge
locket                    1.0.0              pyhd8ed1ab_0    conda-forge
lz4                       4.3.3            py38hdcd8cb4_0    conda-forge
lz4-c                     1.9.4                hcb278e6_0    conda-forge
markdown                  3.6                pyhd8ed1ab_0    conda-forge
markupsafe                2.1.5            py38h01eb140_0    conda-forge
matplotlib                3.7.3            py38h578d9bd_0    conda-forge
matplotlib-base           3.7.3            py38h58ed7fa_0    conda-forge
mpc                       1.3.1                h24ddda3_1    conda-forge
mpfr                      4.2.1                h90cbb55_3    conda-forge
mpg123                    1.32.9               hc50e24c_0    conda-forge
mpmath                    1.3.0              pyhd8ed1ab_0    conda-forge
msgpack-python            1.0.8            py38hea7755e_0    conda-forge
msprime                   1.3.1            py38h50512c5_1    conda-forge
munkres                   1.1.4              pyh9f0ad1d_0    conda-forge
mysql-common              8.3.0                hf1915f5_4    conda-forge
mysql-libs                8.3.0                hca2cd23_4    conda-forge
ncurses                   6.5                  he02047a_1    conda-forge
networkx                  3.1                pyhd8ed1ab_0    conda-forge
newick                    1.9.0                    pypi_0    pypi
nomkl                     1.0                  h5ca1d4c_0    conda-forge
nose                      1.3.7                   py_1006    conda-forge
nspr                      4.36                 h5888daf_0    conda-forge
nss                       3.100                hca3bf56_0    conda-forge
numba                     0.53.1           py38ha9443f7_0
numcodecs                 0.12.1           py38h854fd01_1    conda-forge
numexpr                   2.8.4           py38hb2af0cf_101    conda-forge
numpy                     1.23.5           py38h7042d01_0    conda-forge
openblas                  0.3.28          pthreads_h6ec200e_1    conda-forge
openjpeg                  2.5.2                h488ebb8_0    conda-forge
openssl                   3.4.0                hb9d3cd8_0    conda-forge
opt_einsum                3.4.0              pyhd8ed1ab_0    conda-forge
orc                       2.0.1                h17fec99_1    conda-forge
packaging                 24.2               pyhd8ed1ab_2    conda-forge
pandas                    2.0.3            py38h01efb38_1    conda-forge
partd                     1.4.1              pyhd8ed1ab_0    conda-forge
patsy                     0.5.6              pyhd8ed1ab_0    conda-forge
pcre2                     10.43                hcad00b1_0    conda-forge
pillow                    10.3.0           py38h9e66945_0    conda-forge
pip                       24.3.1             pyh8b19718_0    conda-forge
pixman                    0.44.2               h29eaf8c_0    conda-forge
pkgutil-resolve-name      1.3.10             pyhd8ed1ab_1    conda-forge
platformdirs              4.3.6              pyhd8ed1ab_0    conda-forge
ply                       3.11               pyhd8ed1ab_2    conda-forge
pomegranate               1.0.0              pyhd8ed1ab_1    conda-forge
pooch                     1.8.2              pyhd8ed1ab_0    conda-forge
protobuf                  4.25.3           py38hb5c7596_0    conda-forge
psutil                    6.0.0            py38hfb59056_0    conda-forge
pthread-stubs             0.4               hb9d3cd8_1002    conda-forge
pulseaudio-client         17.0                 hb77b528_0    conda-forge
pyarrow                   16.1.0           py38hb563948_2    conda-forge
pyarrow-core              16.1.0          py38he753e70_2_cpu    conda-forge
pycparser                 2.22               pyhd8ed1ab_0    conda-forge
pyparsing                 3.1.4              pyhd8ed1ab_0    conda-forge
pyqt                      5.15.9           py38hffdaa6c_5    conda-forge
pyqt5-sip                 12.12.2          py38h17151c0_5    conda-forge
pysocks                   1.7.1              pyha2e5f31_6    conda-forge
python                    3.8.19          hd12c33a_0_cpython    conda-forge
python-dateutil           2.9.0              pyhd8ed1ab_0    conda-forge
python-tzdata             2024.2             pyhd8ed1ab_0    conda-forge
python_abi                3.8                      5_cp38    conda-forge
pytorch                   2.4.0           cpu_generic_py38hbd07d99_1    conda-forge
pytz                      2024.2             pyhd8ed1ab_0    conda-forge
pyyaml                    6.0.2            py38h2019614_0    conda-forge
qt-main                   5.15.8              hc9dc06e_21    conda-forge
re2                       2023.09.01           h7f4b329_2    conda-forge
readline                  8.2                  h8228510_1    conda-forge
referencing               0.35.1             pyhd8ed1ab_0    conda-forge
relernn                   0.2                      pypi_0    pypi
requests                  2.32.3             pyhd8ed1ab_0    conda-forge
rpds-py                   0.20.0           py38h4005ec7_0    conda-forge
ruamel.yaml               0.18.6           py38h01eb140_0    conda-forge
ruamel.yaml.clib          0.2.8            py38h01eb140_0    conda-forge
s2n                       1.4.15               he19d79f_0    conda-forge
scikit-allel              1.3.7            py38h53bb729_1    conda-forge
scikit-learn              1.3.2            py38ha25d942_2    conda-forge
scipy                     1.10.1           py38h32ae08f_1
seaborn                   0.13.2               hd8ed1ab_2    conda-forge
seaborn-base              0.13.2             pyhd8ed1ab_2    conda-forge
setuptools                75.3.0             pyhd8ed1ab_0    conda-forge
sip                       6.7.12           py38h17151c0_0    conda-forge
six                       1.16.0             pyh6c4a22f_0    conda-forge
sleef                     3.7                  h1b44611_2    conda-forge
snappy                    1.2.1                h8bd8927_1    conda-forge
sortedcontainers          2.4.0              pyhd8ed1ab_0    conda-forge
statsmodels               0.14.1           py38h7f0c24c_0    conda-forge
svgwrite                  1.4.3              pyhd8ed1ab_0    conda-forge
sympy                     1.13.3          pypyh2585a3b_103    conda-forge
tbb                       2020.3               hfd86e86_0
tblib                     3.0.0              pyhd8ed1ab_0    conda-forge
tensorboard               2.17.1             pyhd8ed1ab_0    conda-forge
tensorboard-data-server   0.7.0            py38hcdda232_1    conda-forge
tensorflow                2.2.0           mkl_py38h6d3daf0_0
tensorflow-base           2.2.0           mkl_py38h5059a2d_0
tensorflow-estimator      2.6.0            py38h709712a_0    conda-forge
termcolor                 2.4.0              pyhd8ed1ab_0    conda-forge
threadpoolctl             3.5.0              pyhc1e730c_0    conda-forge
tk                        8.6.13          noxft_h4845f30_101    conda-forge
toml                      0.10.2             pyhd8ed1ab_0    conda-forge
tomli                     2.0.2              pyhd8ed1ab_0    conda-forge
toolz                     1.0.0              pyhd8ed1ab_0    conda-forge
tornado                   6.4.1            py38hfb59056_0    conda-forge
tqdm                      4.67.1             pyhd8ed1ab_0    conda-forge
tskit                     0.5.6            py38he82f83a_2    conda-forge
typing_extensions         4.12.2             pyha770c72_0    conda-forge
tzdata                    2024b                hc8b5060_0    conda-forge
unicodedata2              15.1.0           py38h01eb140_0    conda-forge
urllib3                   2.2.3              pyhd8ed1ab_0    conda-forge
werkzeug                  3.0.6              pyhd8ed1ab_0    conda-forge
wheel                     0.45.1             pyhd8ed1ab_0    conda-forge
wrapt                     1.16.0           py38h01eb140_0    conda-forge
xcb-util                  0.4.0                hd590300_1    conda-forge
xcb-util-image            0.4.0                h8ee46fc_1    conda-forge
xcb-util-keysyms          0.4.0                h8ee46fc_1    conda-forge
xcb-util-renderutil       0.3.9                hd590300_1    conda-forge
xcb-util-wm               0.4.1                h8ee46fc_1    conda-forge
xkeyboard-config          2.42                 h4ab18f5_0    conda-forge
xorg-kbproto              1.0.7             hb9d3cd8_1003    conda-forge
xorg-libice               1.1.2                hb9d3cd8_0    conda-forge
xorg-libsm                1.2.5                he73a12e_0    conda-forge
xorg-libx11               1.8.9                h8ee46fc_0    conda-forge
xorg-libxau               1.0.12               hb9d3cd8_0    conda-forge
xorg-libxdmcp             1.1.5                hb9d3cd8_0    conda-forge
xorg-libxext              1.3.4                h0b41bf4_2    conda-forge
xorg-libxrender           0.9.11               hd590300_0    conda-forge
xorg-renderproto          0.11.1            hb9d3cd8_1003    conda-forge
xorg-xextproto            7.3.0             hb9d3cd8_1004    conda-forge
xorg-xf86vidmodeproto     2.3.1             hb9d3cd8_1005    conda-forge
xorg-xproto               7.0.31            hb9d3cd8_1008    conda-forge
xyzservices               2024.9.0           pyhd8ed1ab_1    conda-forge
xz                        5.6.3                hbcc6ac9_1    conda-forge
xz-gpl-tools              5.6.3                hbcc6ac9_1    conda-forge
xz-tools                  5.6.3                hb9d3cd8_1    conda-forge
yaml                      0.2.5                h7f98852_2    conda-forge
zarr                      2.17.1             pyhd8ed1ab_0    conda-forge
zict                      3.0.0              pyhd8ed1ab_0    conda-forge
zipp                      3.21.0             pyhd8ed1ab_0    conda-forge
zlib                      1.2.13               h4ab18f5_6    conda-forge
zstandard                 0.23.0           py38h62bed22_0    conda-forge
zstd                      1.5.6                ha6fb4c9_0    conda-forge

@andrewkern
Copy link
Member

looks like this might be caused by the older version of tensorflow you are using. try these steps from within the ReLERNN directory

# 1. create a new conda env, activate it
conda create -n relernn_test python=3.10 --yes
conda activate relernn_test

# 2. confirm pip is pointing to this env
which pip

# 3. use that pip to install everything for this repo
pip install .

# 4. test this installation
cd examples
./example_pipeline.sh

@NilaBlueshirt
Copy link

NilaBlueshirt commented Jan 23, 2025

Hi Andrew,

Thanks for the help!

I have tried your suggested lines and here are the output:
ModuleNotFoundError: No module named 'h5py'

So I installed h5py to this testing env, and reran the test script:
ModuleNotFoundError: No module named 'tensorflow'

Then again I installed tensorflow via conda/mamba to this test env. And tested it again (numpy was downgraded from 2.2.2 to 1.26.4). This time ReLERNN was able to begin running:

...
 Total params: 771,889 (2.94 MB)
 Trainable params: 771,889 (2.94 MB)
 Non-trainable params: 0 (0.00 B)
Traceback (most recent call last):
  File "/packages/envs/relearnn-1.0.0/bin/ReLERNN_TRAIN", line 130, in <module>
    main()
  File "/packages/envs/relearnn-1.0.0/bin/ReLERNN_TRAIN", line 109, in main
    runModels(ModelFuncPointer=GRU_TUNED84,
  File "/packages/envs/relearnn-1.0.0/lib/python3.10/site-packages/ReLERNN/helpers.py", line 370, in runModels
    history = model.fit(TrainGenerator,
  File "/packages/envs/relearnn-1.0.0/lib/python3.10/site-packages/keras/src/utils/traceback_utils.py", line 122, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/packages/envs/relearnn-1.0.0/lib/python3.10/site-packages/keras/src/utils/traceback_utils.py", line 117, in error_handler
    return fn(*args, **kwargs)
TypeError: TensorFlowTrainer.fit() got an unexpected keyword argument 'use_multiprocessing'
2025-01-23 12:39:33.934900: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Importing HDF5: "./example_output/splitVCFs/example_2L:0-840000.hdf5"...
Traceback (most recent call last):
  File "/packages/envs/relearnn-1.0.0/bin/ReLERNN_PREDICT", line 155, in <module>
    main()
  File "/packages/envs/relearnn-1.0.0/bin/ReLERNN_PREDICT", line 122, in main
    load_and_predictVCF(VCFGenerator=vcf_gen,
  File "/packages/envs/relearnn-1.0.0/lib/python3.10/site-packages/ReLERNN/helpers.py", line 284, in load_and_predictVCF
    jsonFILE = open(network[0],"r")
FileNotFoundError: [Errno 2] No such file or directory: './example_output/networks/model.json'
2025-01-23 12:39:38.412544: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Error: no .PREDICT.txt file found. You must run ReLERNN_PREDICT.py prior to running ReLERNN_BSCORRECT.py

I'm not sure if you have seen this error before, but I appreciate any help!

Regards,
Nil

@andrewkern
Copy link
Member

hello @NilaBlueshirt - it sounds like you have a python environment issue.

assuming you are working on a linux machine, I recommend the same steps as above. After you have cloned this repo, cd to the directory and then:

# 1. create a new conda env, activate it
conda create -n relernn_test python=3.10 --yes
conda activate relernn_test

# 2. confirm pip is pointing to this env
which pip

# 3. use that pip to install everything for this repo
pip install .

# 4. test this installation
cd examples
./example_pipeline.sh

this will definitely install tensorflow and h5py, among other packages and the example_pipeline.sh workflow should run

@NilaBlueshirt
Copy link

NilaBlueshirt commented Jan 27, 2025

Hi @andrewkern ,
Thanks for getting back to me. The errors I described in my first reply, were generated after your step 4. Here are more details:

$ mamba create -n relearnn-1.0.0 -c conda-forge python=3.10 -y
...

$ source activate relearnn-1.0.0

$ which pip
/packages/envs/relearnn-1.0.0/bin/pip

$ which python
/packages/envs/relearnn-1.0.0/bin/python

$ pip install .
... 
Building wheels for collected packages: ReLERNN
  Building wheel for ReLERNN (setup.py) ... done
  Created wheel for ReLERNN: filename=ReLERNN-0.2-py3-none-any.whl size=44449 sha256=9494e619a61fab00d86cc99320707b520aec017da8a7aaa95a9c685169c7e754
  Stored in directory: /tmp/pip-ephem-wheel-cache-hnqqww0b/wheels/5a/90/be/ab9f318b7c8a7e520edb7963bf25c02a03f3a447944c7aa6b7
Successfully built ReLERNN
Installing collected packages: zipp, typing-extensions, toolz, threadpoolctl, svgwrite, six, ruamel.yaml.clib, rpds-py, pyyaml, pyparsing, pillow, packaging, numpy, newick, locket, kiwisolver, joblib, fsspec, fonttools, cycler, cloudpickle, click, attrs, scipy, ruamel.yaml, referencing, python-dateutil, partd, importlib_metadata, contourpy, scikit-learn, matplotlib, jsonschema-specifications, demes, dask, jsonschema, tskit, scikit-allel, msprime, ReLERNN
Successfully installed ReLERNN-0.2 attrs-25.1.0 click-8.1.8 cloudpickle-3.1.1 contourpy-1.3.1 cycler-0.12.1 dask-2025.1.0 demes-0.2.3 fonttools-4.55.6 fsspec-2024.12.0 importlib_metadata-8.6.1 joblib-1.4.2 jsonschema-4.23.0 jsonschema-specifications-2024.10.1 kiwisolver-1.4.8 locket-1.0.0 matplotlib-3.10.0 msprime-1.3.3 newick-1.9.0 numpy-2.2.2 packaging-24.2 partd-1.4.2 pillow-11.1.0 pyparsing-3.2.1 python-dateutil-2.9.0.post0 pyyaml-6.0.2 referencing-0.36.2 rpds-py-0.22.3 ruamel.yaml-0.18.10 ruamel.yaml.clib-0.2.12 scikit-allel-1.3.13 scikit-learn-1.6.1 scipy-1.15.1 six-1.17.0 svgwrite-1.4.3 threadpoolctl-3.5.0 toolz-1.0.0 tskit-0.6.0 typing-extensions-4.12.2 zipp-3.21.0

$ cd examples

$ ./example_pipeline.sh
Traceback (most recent call last):
  File "/packages/envs/relearnn-1.0.0/bin/ReLERNN_SIMULATE", line 7, in <module>
    from ReLERNN.imports import *
  File "/packages/envs/relearnn-1.0.0/lib/python3.10/site-packages/ReLERNN/__init__.py", line 3, in <module>
    from ReLERNN.imports import *
  File "/packages/envs/relearnn-1.0.0/lib/python3.10/site-packages/ReLERNN/imports.py", line 12, in <module>
    import h5py
ModuleNotFoundError: No module named 'h5py'

That's why I would need to manually installed 'h5py' and so on. My apologies for the confusion, and thanks again for helping us.

Regards,
Nil

@andrewkern
Copy link
Member

andrewkern commented Jan 27, 2025

something isn't going right here, your pip install . call doesn't seem to be reading all the correct requirements from the setup.py. do you have the newest version of relernn from the repo?

@NilaBlueshirt
Copy link

Thanks for the quick reply! Yes, I cloned the main branch of the repo last week. I noticed that there is a setup_fix branch, should I be using that one?

@andrewkern
Copy link
Member

ack i think we had a commit that hadn't hit the main branch. please clone the repo and try these same steps again.

@NilaBlueshirt
Copy link

NilaBlueshirt commented Jan 28, 2025

The pip install output seems to be good now:

Building wheels for collected packages: ReLERNN
  Building wheel for ReLERNN (setup.py) ... done
  Created wheel for ReLERNN: filename=ReLERNN-0.2-py3-none-any.whl size=44461 sha256=53aab229bed27a9dc80af6abb3d095583ea33dadd532f4dc48fbb38449f15248
  Stored in directory: /tmp/pip-ephem-wheel-cache-albzfg84/wheels/5a/90/be/ab9f318b7c8a7e520edb7963bf25c02a03f3a447944c7aa6b7
Successfully built ReLERNN
Installing collected packages: libclang, flatbuffers, zipp, wrapt, urllib3, typing-extensions, toolz, threadpoolctl, termcolor, tensorflow-io-gcs-filesystem, tensorflow-estimator, tensorboard-data-server, svgwrite, six, ruamel.yaml.clib, rpds-py, pyyaml, pyparsing, pyasn1, protobuf, pillow, packaging, opt-einsum, oauthlib, numpy, newick, MarkupSafe, markdown, locket, kiwisolver, keras, joblib, idna, grpcio, gast, fsspec, fonttools, cycler, cloudpickle, click, charset-normalizer, certifi, cachetools, attrs, absl-py, werkzeug, scipy, ruamel.yaml, rsa, requests, referencing, python-dateutil, pyasn1-modules, partd, ml-dtypes, importlib_metadata, h5py, google-pasta, contourpy, astunparse, scikit-learn, requests-oauthlib, matplotlib, jsonschema-specifications, google-auth, demes, dask, jsonschema, google-auth-oauthlib, tskit, tensorboard, scikit-allel, tensorflow, msprime, ReLERNN
Successfully installed MarkupSafe-3.0.2 ReLERNN-0.2 absl-py-2.1.0 astunparse-1.6.3 attrs-25.1.0 cachetools-5.5.1 certifi-2024.12.14 charset-normalizer-3.4.1 click-8.1.8 cloudpickle-3.1.1 contourpy-1.3.1 cycler-0.12.1 dask-2025.1.0 demes-0.2.3 flatbuffers-25.1.24 fonttools-4.55.6 fsspec-2024.12.0 gast-0.6.0 google-auth-2.38.0 google-auth-oauthlib-1.2.1 google-pasta-0.2.0 grpcio-1.70.0 h5py-3.12.1 idna-3.10 importlib_metadata-8.6.1 joblib-1.4.2 jsonschema-4.23.0 jsonschema-specifications-2024.10.1 keras-2.15.0 kiwisolver-1.4.8 libclang-18.1.1 locket-1.0.0 markdown-3.7 matplotlib-3.10.0 ml-dtypes-0.2.0 msprime-1.3.3 newick-1.9.0 numpy-1.26.4 oauthlib-3.2.2 opt-einsum-3.4.0 packaging-24.2 partd-1.4.2 pillow-11.1.0 protobuf-4.25.6 pyasn1-0.6.1 pyasn1-modules-0.4.1 pyparsing-3.2.1 python-dateutil-2.9.0.post0 pyyaml-6.0.2 referencing-0.36.2 requests-2.32.3 requests-oauthlib-2.0.0 rpds-py-0.22.3 rsa-4.9 ruamel.yaml-0.18.10 ruamel.yaml.clib-0.2.12 scikit-allel-1.3.13 scikit-learn-1.6.1 scipy-1.15.1 six-1.17.0 svgwrite-1.4.3 tensorboard-2.15.2 tensorboard-data-server-0.7.2 tensorflow-2.15.0 tensorflow-estimator-2.15.0 tensorflow-io-gcs-filesystem-0.37.1 termcolor-2.5.0 threadpoolctl-3.5.0 toolz-1.0.0 tskit-0.6.0 typing-extensions-4.12.2 urllib3-2.3.0 werkzeug-3.1.3 wrapt-1.14.1 zipp-3.21.0

The machine I'm using has 4 A100s and has cuda 12.5 installed. However when I ran the example_pipeline.sh, the first couple of lines of the output are:

2025-01-27 17:08:55.511310: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2025-01-27 17:08:56.091919: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-01-27 17:08:56.092003: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-01-27 17:08:56.210692: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-01-27 17:08:56.456773: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2025-01-27 17:08:56.458393: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-01-27 17:08:58.746718: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Warning: no demographic history file found. All training data will be simulated under demographic equilibrium.
Split chromosome: 2L...
Split chromosome: 2R...
Split chromosome: 3L...
Split chromosome: 3R...
Split chromosome: X...

I'm worried that the tensorflow installed here is the CPU version, not the GPU version. So I made this testing script to test the env:

import tensorflow as tf

# Check TensorFlow version
print("TensorFlow version:", tf.__version__)

# Check if a GPU is available
print("GPU available:", tf.config.list_physical_devices('GPU'))

And when running this script within the relearnn-1.0.0 env I just made, the output has the similar errors as above:

2025-01-27 17:24:50.843216: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2025-01-27 17:24:50.871853: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-01-27 17:24:50.871891: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-01-27 17:24:50.872883: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-01-27 17:24:50.877749: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2025-01-27 17:24:50.877920: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-01-27 17:24:52.765075: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
TensorFlow version: 2.15.0
2025-01-27 17:24:56.090959: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2256] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
GPU available: []

Thanks for your efforts.

@andrewkern
Copy link
Member

this looks like CUDA isn't installed on your system. do you know if it is?

if it isn't you can try to do the following in your relernn env:

python3 -m pip install 'tensorflow[and-cuda]'
# Verify the installation:
python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

@NilaBlueshirt
Copy link

NilaBlueshirt commented Jan 28, 2025

I thought my cuda 12.5 is working as it shows up in nvidia-smi outputs. But here is what I got for running the commands you suggested (in the same relearnn-1.0.0 env):

$ python3 -m pip install 'tensorflow[and-cuda]'

Installing collected packages: namex, pygments, optree, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-nvcc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, ml-dtypes, mdurl, tensorboard, nvidia-cusparse-cu12, nvidia-cufft-cu12, nvidia-cudnn-cu12, markdown-it-py, rich, nvidia-cusolver-cu12, keras, tensorflow
  Attempting uninstall: ml-dtypes
    Found existing installation: ml-dtypes 0.2.0
    Uninstalling ml-dtypes-0.2.0:
      Successfully uninstalled ml-dtypes-0.2.0
  Attempting uninstall: tensorboard
    Found existing installation: tensorboard 2.15.2
    Uninstalling tensorboard-2.15.2:
      Successfully uninstalled tensorboard-2.15.2
  Attempting uninstall: keras
    Found existing installation: keras 2.15.0
    Uninstalling keras-2.15.0:
      Successfully uninstalled keras-2.15.0
  Attempting uninstall: tensorflow
    Found existing installation: tensorflow 2.15.0
    Uninstalling tensorflow-2.15.0:
      Successfully uninstalled tensorflow-2.15.0
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
relernn 0.2 requires tensorflow==2.15.0, but you have tensorflow 2.18.0 which is incompatible.
Successfully installed keras-3.8.0 markdown-it-py-3.0.0 mdurl-0.1.2 ml-dtypes-0.4.1 namex-0.0.8 nvidia-cublas-cu12-12.5.3.2 nvidia-cuda-cupti-cu12-12.5.82 nvidia-cuda-nvcc-cu12-12.5.82 nvidia-cuda-nvrtc-cu12-12.5.82 nvidia-cuda-runtime-cu12-12.5.82 nvidia-cudnn-cu12-9.3.0.75 nvidia-cufft-cu12-11.2.3.61 nvidia-curand-cu12-10.3.6.82 nvidia-cusolver-cu12-11.6.3.83 nvidia-cusparse-cu12-12.5.1.3 nvidia-nccl-cu12-2.21.5 nvidia-nvjitlink-cu12-12.5.82 optree-0.14.0 pygments-2.19.1 rich-13.9.4 tensorboard-2.18.0 tensorflow-2.18.0

And then the test says:

$ python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

2025-01-27 18:34:31.317555: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1738028071.486937   75281 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1738028071.534654   75281 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-01-27 18:34:31.956131: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:2', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:3', device_type='GPU')]

Would it be ok if I change the line "tensorflow==2.15.0" into tensorflow[and-cuda]?

And then I re-ran the example_pipeline.py, here is the full output:

2025-01-27 18:46:39.125539: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1738028799.139504   75619 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1738028799.143695   75619 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-01-27 18:46:39.159773: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Warning: no demographic history file found. All training data will be simulated under demographic equilibrium.
Split chromosome: 2R...
Split chromosome: 2L...
Split chromosome: 3L...
Split chromosome: 3R...
Split chromosome: X...
Converting ./example_output/splitVCFs/example_2L:0-840000.vcf to HDF5...
Converting ./example_output/splitVCFs/example_2R:0-1669000.vcf to HDF5...
Converting ./example_output/splitVCFs/example_3R:0-1963000.vcf to HDF5...
Converting ./example_output/splitVCFs/example_X:0-1250000.vcf to HDF5...
Converting ./example_output/splitVCFs/example_3L:0-742000.vcf to HDF5...
Reading HDF5: "./example_output/splitVCFs/example_2L:0-840000.hdf5"...
Reading HDF5: "./example_output/splitVCFs/example_2R:0-1669000.hdf5"...
Reading HDF5: "./example_output/splitVCFs/example_3L:0-742000.hdf5"...
Reading HDF5: "./example_output/splitVCFs/example_3R:0-1963000.hdf5"...
Reading HDF5: "./example_output/splitVCFs/example_X:0-1250000.hdf5"...

Accessibility mask found: calculating the proportion of the genome that is masked...
1.3% of genome inaccessible

Simulating with window size = 211000 bp.
Training set:
Simulate...
Validation set:
Simulate...
Test set:
Simulate...

SIMULATIONS FINISHED!

SANITY CHECK
====================
numSegSites                     Min     Mean    Max
Simulated:                      145     998     2498
InputVCF 2L:0-840000:           238     909     1741
InputVCF 2R:0-1669000:          411     1000    1754
InputVCF 3L:0-742000:           143     909     1777
InputVCF 3R:0-1963000:          358     1000    1759
InputVCF X:0-1250000:           127     1000    1720


***ReLERNN_SIMULATE.py FINISHED!***

2025-01-27 18:47:58.679078: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1738028878.693167   76377 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1738028878.697448   76377 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-01-27 18:47:58.713708: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
I0000 00:00:1738028882.734427   76377 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 79197 MB memory:  -> device: 0, name: NVIDIA A100-SXM4-80GB, pci bus id: 0000:01:00.0, compute capability: 8.0
I0000 00:00:1738028882.897592   76377 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 79197 MB memory:  -> device: 0, name: NVIDIA A100-SXM4-80GB, pci bus id: 0000:01:00.0, compute capability: 8.0
Model: "functional"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Layer (type)                  ┃ Output Shape              ┃         Param # ┃ Connected to               ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ input_layer (InputLayer)      │ (None, 2508, 20)          │               0 │ -                          │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ bidirectional (Bidirectional) │ (None, 168)               │          53,424 │ input_layer[0][0]          │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ dense (Dense)                 │ (None, 256)               │          43,264 │ bidirectional[0][0]        │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ input_layer_1 (InputLayer)    │ (None, 2508)              │               0 │ -                          │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ dropout (Dropout)             │ (None, 256)               │               0 │ dense[0][0]                │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ dense_1 (Dense)               │ (None, 256)               │         642,304 │ input_layer_1[0][0]        │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ concatenate (Concatenate)     │ (None, 512)               │               0 │ dropout[0][0],             │
│                               │                           │                 │ dense_1[0][0]              │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ dense_2 (Dense)               │ (None, 64)                │          32,832 │ concatenate[0][0]          │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ dropout_1 (Dropout)           │ (None, 64)                │               0 │ dense_2[0][0]              │
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤
│ dense_3 (Dense)               │ (None, 1)                 │              65 │ dropout_1[0][0]            │
└───────────────────────────────┴───────────────────────────┴─────────────────┴────────────────────────────┘
 Total params: 771,889 (2.94 MB)
 Trainable params: 771,889 (2.94 MB)
 Non-trainable params: 0 (0.00 B)
Traceback (most recent call last):
  File "/packages/envs/relearnn-1.0.0/bin/ReLERNN_TRAIN", line 130, in <module>
    main()
  File "/packages/envs/relearnn-1.0.0/bin/ReLERNN_TRAIN", line 109, in main
    runModels(ModelFuncPointer=GRU_TUNED84,
  File "/packages/envs/relearnn-1.0.0/lib/python3.10/site-packages/ReLERNN/helpers.py", line 370, in runModels
    history = model.fit(TrainGenerator,
  File "/packages/envs/relearnn-1.0.0/lib/python3.10/site-packages/keras/src/utils/traceback_utils.py", line 122, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/packages/envs/relearnn-1.0.0/lib/python3.10/site-packages/keras/src/utils/traceback_utils.py", line 117, in error_handler
    return fn(*args, **kwargs)
TypeError: TensorFlowTrainer.fit() got an unexpected keyword argument 'use_multiprocessing'
2025-01-27 18:48:07.889282: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1738028887.902228   76674 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1738028887.906192   76674 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-01-27 18:48:07.920394: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Importing HDF5: "./example_output/splitVCFs/example_2L:0-840000.hdf5"...
I0000 00:00:1738028890.473773   76674 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 79197 MB memory:  -> device: 0, name: NVIDIA A100-SXM4-80GB, pci bus id: 0000:01:00.0, compute capability: 8.0
Traceback (most recent call last):
  File "/packages/envs/relearnn-1.0.0/bin/ReLERNN_PREDICT", line 155, in <module>
    main()
  File "/packages/envs/relearnn-1.0.0/bin/ReLERNN_PREDICT", line 122, in main
    load_and_predictVCF(VCFGenerator=vcf_gen,
  File "/packages/envs/relearnn-1.0.0/lib/python3.10/site-packages/ReLERNN/helpers.py", line 284, in load_and_predictVCF
    jsonFILE = open(network[0],"r")
FileNotFoundError: [Errno 2] No such file or directory: './example_output/networks/model.json'
2025-01-27 18:48:13.197334: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1738028893.211527   76877 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1738028893.215825   76877 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-01-27 18:48:13.232331: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Error: no .PREDICT.txt file found. You must run ReLERNN_PREDICT.py prior to running ReLERNN_BSCORRECT.py

Thanks.

@andrewkern
Copy link
Member

so i'm betting this is because you've now installed a different version of tensorflow when you installed with cuda. what version does it say you have? basically what's happening is the pipeline is looking for a model named './example_output/networks/model.json' -- what files do you see in that directory?

@NilaBlueshirt
Copy link

NilaBlueshirt commented Feb 4, 2025

Hi Andrew,

You are right, it's the tensorflow and cuda not playing nice on my side. I resolved the cuda version error and despite the cuda warning messages, the example_pipeline.py runs perfectly fine on a GPU. Thanks again for all your efforts!

Here is what I did in case someone else wants to manage the dependencies with mamba instead of pip:

$ git clone https://github.com/kr-colab/ReLERNN.git
$ cd ReLERNN
$ mamba create -n relearnn-1.0.0 -c conda-forge -c nvidia python=3.10 tensorflow=2.15.0 cuda-toolkit h5py -y
$ pip install .
$ ./example_pipeline.sh

The harmless warnings and errors:

$ ./example_pipeline.sh 
2025-02-03 16:55:28.942972: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-02-03 16:55:28.943023: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-02-03 16:55:28.944064: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-02-03 16:55:28.949335: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Warning: no demographic history file found. All training data will be simulated under demographic equilibrium.
Split chromosome: 2R...
...

Nice GPU utilization rate when running epochs:

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants