Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Will accept Pull Requests on my fork / recent builds available on conda-forge #57

Open
carterbox opened this issue Dec 19, 2023 · 6 comments

Comments

@carterbox
Copy link

carterbox commented Dec 19, 2023

If you're reading this, it's because you're wondering if this package is still active. No. The last updates were merge in November 2020. However, I really like this package, so I will accept pull requests over on my fork. If Matteo comes back, I will happily help to merge my changes upstream.

https://github.com/carterbox/torch-radon

I also publish pre-compiled releases of my fork for the conda package manager on the conda-forge channel.

https://anaconda.org/conda-forge/carterbox-torch-radon

@Masaaki-75
Copy link

Masaaki-75 commented Jan 4, 2024

Is there any document/description on the supported Pytorch/CUDA versions?

With python==3.8.18, I found the conda package can be installed with pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=12.1, but got following messages (constantly solving environment issue) with pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.7 :

Collecting package metadata (current_repodata.json): / WARNING conda.models.version:get_matcher(542): Using .* with relational operator is superfluous and deprecated and will be removed in a future version of conda. Your spec was 1.7.1.*, but conda is ignoring the .* and treating it as 1.7.1
done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): | WARNING conda.models.version:get_matcher(542): Using .* with relational operator is superfluous and deprecated and will be removed in a future version of conda. Your spec was 1.8.0.*, but conda is ignoring the .* and treating it as 1.8.0
WARNING conda.models.version:get_matcher(542): Using .* with relational operator is superfluous and deprecated and will be removed in a future version of conda. Your spec was 1.9.0.*, but conda is ignoring the .* and treating it as 1.9.0
WARNING conda.models.version:get_matcher(542): Using .* with relational operator is superfluous and deprecated and will be removed in a future version of conda. Your spec was 1.6.0.*, but conda is ignoring the .* and treating it as 1.6.0
done
Solving environment: /

@carterbox
Copy link
Author

carterbox commented Jan 4, 2024

The conda packages are built from this repository: https://github.com/conda-forge/carterbox-torch-radon-feedstock

pytorch-cuda is not a package on the channel; Conda autodetects which CUDA version is appropriate for the host machine. Otherwise, use the cuda-version package. conda-forge is currently building with toolkits 11.2, 11.8, and 12.x

(base) bash-5.1$ conda create -n test pytorch torchvision torchaudio pytorch-cuda
Channels:
 - conda-forge
Platform: linux-64
Collecting package metadata (repodata.json): done
Solving environment: failed

PackagesNotFoundError: The following packages are not available from current channels:

  - pytorch-cuda
  - torchaudio

As the solver will tell you, if you try to create that environment unconstrained, torchaudio is not available on the channel. You can submit a new conda recipe to build on the channel or assist with an existing PR in the conda-forge/staged-recipes repo. It seems like a few attempts have been made to build torchaudio using conda, but I'm not sure why none have been merged.

@Masaaki-75
Copy link

Masaaki-75 commented Jan 14, 2024

Thank you for the hint! And sorry for my late reply.

I feel like I didn't make myself clear. I had trouble installing your releases of torch-radon, probably because of the incompatibility issue. I was trying to install your release based on my RTX-3090 machine, which has an older CUDA version of 11.5 and unfortunately cannot be updated for some time (so not supporting pytorch >2.0 that requires CUDA > 11.7).

And with conda install --channel conda-forge carterbox-torch-radon, the installed carterbox-torch-radon package seems to already wrap pytorch-2.1.0-cuda120py38h1932296_301 and cuda-version-12.2-he2b69de_2 inside. Therefore, I got messages like The NVIDIA driver on your system is too old (My conda environment has torch==1.11.0+cu113, torchvision==0.12.0+cu113, and torchaudio==0.11.0 installed).

So, I am wondering is there any way to install carterbox-torch-radon that supports older version of PyTorch & CUDA? 🤔

@Masaaki-75
Copy link

Masaaki-75 commented Jan 15, 2024

Update:

I found it ok to run python setup.py install with python==3.9 and torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0.

However, I found another problem related to the computing of gradient. Concretely, I wrote a test script that trains of a dual-domain dummy network, but found the grad_x remains incontiguous.

The error messages are like:

(carter) clma@my_server:~/projects/mar/RIL$ python radon_v2_example3.py
Batch: 0
>>> Forwarding sino net.
>>> Forwarding img net.
Traceback (most recent call last):
  File "/home/clma/projects/mar/RIL/radon_v2_example3.py", line 119, in <module>
    loss.backward()
  File "/home/clma/miniconda3/envs/carter/lib/python3.9/site-packages/torch/_tensor.py", line 363, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/clma/miniconda3/envs/carter/lib/python3.9/site-packages/torch/autograd/__init__.py", line 173, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/home/clma/miniconda3/envs/carter/lib/python3.9/site-packages/torch/autograd/function.py", line 253, in apply
    return user_fn(self, *args)
  File "/home/clma/miniconda3/envs/carter/lib/python3.9/site-packages/torch_radon-0.0.0-py3.9-linux-x86_64.egg/torch_radon/differentiable_functions.py", line 46, in backward
    grad = cuda_backend.forward(grad_x, angles, ctx.tex_cache, ctx.vol_cfg, ctx.proj_cfg, exec_cfg)
RuntimeError: x must be contiguous

And the test script radon_v2_example3.py is as follows:

import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset
from torch_radon import FanBeam, Volume2D


class TestDataset(Dataset):
    def __init__(self):
        super().__init__()
        img = np.zeros((512, 512), dtype=np.float32)
        img[:, 255] = 1.
        img[255, :] = 1.
        
        imgs = [
            img,
            np.fliplr(img).copy(),
            np.rot90(img, k=1).copy(),
            np.rot90(img, k=2).copy(),
            np.rot90(img, k=3).copy(),
        ]
        self.imgs = [_ for _ in imgs] + [(1.0 - _.copy()) for _ in imgs]
    
    def __len__(self):
        return len(self.imgs)

    def __getitem__(self, index):
        img = self.imgs[index]
        x = torch.from_numpy(img).unsqueeze(0).float().contiguous()
        return x


class TestNet(nn.Module):
    def __init__(self, channels=3) -> None:
        super().__init__()
        self.sino_model = nn.Sequential(
            nn.Conv2d(1, channels, 5, padding=2),
            nn.Conv2d(channels, 1, 5, padding=2)
        )
        self.img_model = nn.Conv2d(2, 1, 3, padding=1).to(device)
        self.det_count = 672
    
    def forward_sino(self, sino):
        print('>>> Forwarding sino net.')
        return self.sino_model(sino)
    
    def forward_img(self, img1, img2):
        print('>>> Forwarding img net.')
        img = torch.cat([img1, img2], dim=1)
        return self.img_model(img)
    
    def projection(self, img, angles=None, filter=True):
        if angles is None:
            angles = np.linspace(0, np.pi * 2, 360, endpoint=False)
        
        volume = Volume2D()
        volume.set_size(img.shape[-2], img.shape[-1])  # [B, C, H, W]
        radon = FanBeam(self.det_count, angles, volume=volume)
        sino = radon.forward(img)
        if filter:
            sino = radon.filter_sinogram(sino)
        return sino
    
    def backprojection(self, sino, img_shape, angles=None):
        if angles is None:
            angles = np.linspace(0, np.pi * 2, 360, endpoint=False)

        volume = Volume2D()
        volume.set_size(img_shape[-2], img_shape[-1])  # [H, W]
        radon = FanBeam(self.det_count, angles, volume=volume)
        img = radon.backward(sino)
        return img
    
    def forward(self, sino, img_shape, angles=None):
        img = self.backprojection(sino, img_shape, angles=angles)
        sino_pred = self.forward_sino(sino)
        img_sino = self.backprojection(sino_pred, img_shape, angles=angles)
        img_pred = self.forward_img(img, img_sino)
        return sino_pred, img_sino, img_pred
    
    # ==== The following forward (adding contiguous()) does not work ====
    # def forward(self, sino, img_shape, angles=None):
    #     img = self.backprojection(sino, img_shape, angles=angles)
    #     img = img.contiguous()
    #     sino_pred = self.forward_sino(sino)
    #     img_sino = self.backprojection(sino_pred, img_shape, angles=angles)
    #     img_sino = img_sino.contiguous()
    #     img_pred = self.forward_img(img, img_sino)
    #     return sino_pred, img_sino, img_pred


if __name__ == '__main__':
    import torch.optim as optim
    from torch.utils.data import DataLoader
    
    device = torch.device('cuda')
    dataset = TestDataset()
    loader = DataLoader(dataset, batch_size=2, num_workers=4, pin_memory=True, shuffle=True)
    net = TestNet().to(device)
    optimizer = optim.Adam(net.parameters(), lr=1e-4)
    angles = torch.linspace(0, np.pi * 2, 360, requires_grad=False).float()
    
    for i, data in enumerate(loader):
        print(f'Batch: {i}')
        img_shape = data.shape[2:]
        with torch.no_grad():
            sino = net.projection(data.to(device).detach(), angles=angles).detach().contiguous()
            img = net.backprojection(sino, img_shape=img_shape, angles=angles).detach().contiguous()
        
        sino_pred, _, img_pred = net(sino, img_shape=img_shape, angles=angles)
        
        loss1 = F.l1_loss(sino, sino_pred)
        loss2 = F.l1_loss(img, img_pred)
        loss = loss1 + loss2
        
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

@Masaaki-75
Copy link

Update 2:

The above problems are solved by editing the src/python/torch_radon/differentiable_functions.py, and I have opened a PR that has this and other minor issues addressed at https://github.com/carterbox/torch-radon/pull/11.

@carterbox
Copy link
Author

carterbox commented Jan 16, 2024

The minimum driver version for all CUDA 11.x versions is the same., so if you have a driver from the CUDA 11.5 era, you should be able to run a package build with CUDA 11.8 (CUDA 11.2?).

cuda-version is a package which you can constrain. If conda cannot detect your driver version correctly, then you can set (environment variable) CONDA_OVERRIDE_CUDA=11.5 and/or (package) cuda-version=11.5. You will get pytorch=2.1 and torch-radon=2 built against CUDA 11.2 in your environment.

The pytorch version that I build against is chosen by the conda-forge channel, which is already transitioning from 2.0 to 2.1. pytorch 1.13.* is already 2 years old, so I'm not willing to publish pre-build releases.

Please keep this issue on topic by only discussing the pre-compiled releases or issues related to forking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants