-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error converting symbolic tensor object to numpy while running RELERNN_TRAIN #60
Comments
hi there-- to help debug this can you give me a full list of the versions in your python environment? assuming you use also are you getting this error trying to run our example input? |
Hello, thanks for getting back to me. Yes I am getting the same error when I try to run the sample input. Here are my conda environment details:
|
looks like this might be caused by the older version of # 1. create a new conda env, activate it
conda create -n relernn_test python=3.10 --yes
conda activate relernn_test
# 2. confirm pip is pointing to this env
which pip
# 3. use that pip to install everything for this repo
pip install .
# 4. test this installation
cd examples
./example_pipeline.sh |
Hi Andrew, Thanks for the help! I have tried your suggested lines and here are the output: So I installed h5py to this testing env, and reran the test script: Then again I installed tensorflow via conda/mamba to this test env. And tested it again (numpy was downgraded from 2.2.2 to 1.26.4). This time ReLERNN was able to begin running:
I'm not sure if you have seen this error before, but I appreciate any help! Regards, |
hello @NilaBlueshirt - it sounds like you have a python environment issue. assuming you are working on a linux machine, I recommend the same steps as above. After you have cloned this repo, # 1. create a new conda env, activate it
conda create -n relernn_test python=3.10 --yes
conda activate relernn_test
# 2. confirm pip is pointing to this env
which pip
# 3. use that pip to install everything for this repo
pip install .
# 4. test this installation
cd examples
./example_pipeline.sh this will definitely install |
Hi @andrewkern ,
That's why I would need to manually installed 'h5py' and so on. My apologies for the confusion, and thanks again for helping us. Regards, |
something isn't going right here, your |
Thanks for the quick reply! Yes, I cloned the main branch of the repo last week. I noticed that there is a setup_fix branch, should I be using that one? |
ack i think we had a commit that hadn't hit the main branch. please clone the repo and try these same steps again. |
The pip install output seems to be good now:
The machine I'm using has 4 A100s and has cuda 12.5 installed. However when I ran the
I'm worried that the tensorflow installed here is the CPU version, not the GPU version. So I made this testing script to test the env:
And when running this script within the relearnn-1.0.0 env I just made, the output has the similar errors as above:
Thanks for your efforts. |
this looks like CUDA isn't installed on your system. do you know if it is? if it isn't you can try to do the following in your relernn env: python3 -m pip install 'tensorflow[and-cuda]'
# Verify the installation:
python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))" |
I thought my cuda 12.5 is working as it shows up in nvidia-smi outputs. But here is what I got for running the commands you suggested (in the same relearnn-1.0.0 env):
And then the test says:
Would it be ok if I change the line And then I re-ran the example_pipeline.py, here is the full output:
Thanks. |
so i'm betting this is because you've now installed a different version of tensorflow when you installed with cuda. what version does it say you have? basically what's happening is the pipeline is looking for a model named |
Hi Andrew, You are right, it's the tensorflow and cuda not playing nice on my side. I resolved the cuda version error and despite the cuda warning messages, the example_pipeline.py runs perfectly fine on a GPU. Thanks again for all your efforts! Here is what I did in case someone else wants to manage the dependencies with mamba instead of pip:
The harmless warnings and errors:
Nice GPU utilization rate when running epochs: |
While running RELERNN_TRAIN, I ran into the following error which appears to be the result of a failure to convert a tensor object to a numpy array. I ran into this after a fresh install of a conda environment following the same versions of dependencies specified in the documentation (tensorflow/2.2.0, cudatoolkit/10.1.243, and cudnn/7.6.5). Any help would be greatly appreciated!
The text was updated successfully, but these errors were encountered: