Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: Error building extension 'segment_cumsum_cuda' #7

Open
IaroslavS opened this issue Jul 1, 2023 · 2 comments
Open

RuntimeError: Error building extension 'segment_cumsum_cuda' #7

IaroslavS opened this issue Jul 1, 2023 · 2 comments

Comments

@IaroslavS
Copy link

IaroslavS commented Jul 1, 2023

Hi !
I'm trying to run Block-NeRF and I faced this error:

Using /root/.cache/torch_extensions/py310_cu116 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /root/.cache/torch_extensions/py310_cu116/segment_cumsum_cuda/build.ninja...
Building extension module segment_cumsum_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/1] c++ segment_cumsum.o segment_cumsum_kernel.cuda.o -shared -L/opt/conda/lib/python3.10/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda_cu -ltorch_cuda_cpp -ltorch -ltorch_python -L/opt/conda/lib64 -lcudart -o segment_cumsum_cuda.so
FAILED: segment_cumsum_cuda.so 
c++ segment_cumsum.o segment_cumsum_kernel.cuda.o -shared -L/opt/conda/lib/python3.10/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda_cu -ltorch_cuda_cpp -ltorch -ltorch_python -L/opt/conda/lib64 -lcudart -o segment_cumsum_cuda.so
/usr/bin/ld: cannot find -lcudart
collect2: error: ld returned 1 exit status
ninja: build stopped: subcommand failed.
  0%|                                                                                                                                                                                                                                                    | 0/100000 [00:04<?, ?it/s]
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1900, in _run_ninja_build
    subprocess.run(
  File "/opt/conda/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/docker_block_nerf/Block_NeRF/run_FourierGrid.py", line 115, in <module>
    run_train(args, cfg, data_dict, export_cam=True, export_geometry=True)
  File "/home/docker_block_nerf/Block_NeRF/FourierGrid/run_train.py", line 382, in run_train
    psnr = scene_rep_reconstruction(
  File "/home/docker_block_nerf/Block_NeRF/FourierGrid/run_train.py", line 274, in scene_rep_reconstruction
    loss_distortion = flatten_eff_distloss(w, s, 1/n_max, ray_id)
  File "/opt/conda/lib/python3.10/site-packages/torch_efficient_distloss/eff_distloss.py", line 93, in forward
    segment_cumsum_cuda = load(
  File "/opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1284, in load
    return _jit_compile(
  File "/opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1508, in _jit_compile
    _write_ninja_file_and_build_library(
  File "/opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1623, in _write_ninja_file_and_build_library
    _run_ninja_build(
  File "/opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1916, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension 'segment_cumsum_cuda'
(base) root@user:/home/docker_block_nerf/Block_NeRF# 

That is RuntimeError: Error building extension 'segment_cumsum_cuda'. How can I resolve it ?

@mertkaraoglu
Copy link

Hi, I also had a similar issue; here is how I fixed it:

  1. Start with a clean Conda env (I guess Python env would also work, wouldn't hurt to try).
  2. First things first install all the CUDA Runtime API stuff you will need. Nvidia provides the links here; if you use a Python env you could try the pip version.).
    I suppose here one thing is important, designate the CUDA version your Torch uses; for me it was CUDA 11.7 therefore I used the following: conda install cuda -c nvidia/label/cuda-11.7.0.
  3. Install your compatible Torch, for me: pip install torch==2.0.0+cu117 torchvision==0.15.1+cu117 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cu117
  4. Install torch_efficient_loss: pip install torch_efficient_distloss

Hope this helps.

Cheers,

@daipengwa
Copy link

Hi, I also had a similar issue; here is how I fixed it:

  1. Start with a clean Conda env (I guess Python env would also work, wouldn't hurt to try).
  2. First things first install all the CUDA Runtime API stuff you will need. Nvidia provides the links here; if you use a Python env you could try the pip version.).
    I suppose here one thing is important, designate the CUDA version your Torch uses; for me it was CUDA 11.7 therefore I used the following: conda install cuda -c nvidia/label/cuda-11.7.0.
  3. Install your compatible Torch, for me: pip install torch==2.0.0+cu117 torchvision==0.15.1+cu117 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cu117
  4. Install torch_efficient_loss: pip install torch_efficient_distloss

Hope this helps.

Cheers,

I still have this problem after following this instruction. Do you add cuda 11.7 installed using conda into the library path? or something else I can try?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants