Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Installing Root_trainer on an M1 Max Apple Silicon #86

Open
sck-at-ucy opened this issue Feb 16, 2023 · 22 comments
Open

Installing Root_trainer on an M1 Max Apple Silicon #86

sck-at-ucy opened this issue Feb 16, 2023 · 22 comments
Assignees

Comments

@sck-at-ucy
Copy link

Hi Abraham,

Is there a way to install the server on a Mac using Apple Silicon (M1 Max) to take advantage of the GPU directly and not use Rozetta emulation? Any help to get started would be highly appreciated.

@Abe404
Copy link
Owner

Abe404 commented Feb 16, 2023

Hi!

I like this idea, but I am not really sure exactly what is involved and I don't personally have an M1 Mac so it might be quite hard to develop/test it.

From this page: https://pytorch.org/blog/introducing-accelerated-pytorch-training-on-mac/ It looks like it might be as simple as updating your pytorch version after installing the RootPainter server. Perhaps you could try this and let me know how it goes?

In general I recommend connecting to a remote server somewhere if you don't have a suitable GPU. If you dont have your own remote server you could try the colab tutorial: https://colab.research.google.com/drive/104narYAvTBt-X4QEDrBSOZm_DRaAKHtA?usp=sharing The colab setup is a bit slow because its using google drive for sync and the GPU is not amazing but it often gets the job done.

Kind regards,
Abraham

@Abe404
Copy link
Owner

Abe404 commented Feb 16, 2023

From https://pytorch.org/get-started/locally/ and
https://developer.apple.com/metal/pytorch/

It looks like macOS 12.3 or later is required.

@rohanorton
Copy link
Collaborator

rohanorton commented Feb 17, 2023

Made an attempt at implementing this. See commit message in 935fd6a for summary of what was done and the issues faced

@sck-at-ucy
Copy link
Author

This is very helpful, I was also down the path of using ".to(device)" but was stuck with torch.loaf() calls. I will also try it again with the new information from @rohanorton .

@Abe404 I am using MAC OS 12.3x. I think we'll be able to get this.

@Abe404 Abe404 self-assigned this Jun 12, 2023
@Abe404
Copy link
Owner

Abe404 commented Jun 12, 2023

I have been working on this today using an Apple M2 Max. The branch '86-installing-root_trainer-on-an-m1-max-apple-silicon' now contains working code for performing training and inference on an apple GPU using RootPainter.

This is still a work in progress and I have more testing to do. The implementation will likely change, but I wanted to update you here in case you would like to try it out :)

Testing it out will require installing/running from source.

I will perform some more in depth tests but training seems fairly quick compared to Colab. For users with a suitable (recent/modern) MacBook I believe this will be a superior solution to using Google Colab and will suffice in many cases.

@Abe404
Copy link
Owner

Abe404 commented Jun 12, 2023

OK the segment folder function does not work yet. I am working on it. I had to upgrade to PyQt6 to build the client on an M2 Mac. This has caused a few breaking changes but they aren't very hard to fix.

@Abe404
Copy link
Owner

Abe404 commented Jun 12, 2023

The segment_folder function is now working again after the changes in the following commit.
9340417

I'm trying to make a version of the client that builds on an M2 chip and the necessary changes (switching to PyQt6) have introduced these bugs. If you want to try this now it might be better just to use the server in this branch and download one of the Mac clients from the releases.

For now I expect everything on the extras menu and measurements menu to crash (in this WIP pyqt6 client). I'll update soon when I have more fixes.

@ethanbass
Copy link

ethanbass commented Jul 17, 2023

Would love to get the server running on my M1 mac. I installed the m1 branch from github (86-installing-root_trainer-on-an-m1-max-apple-silicon) and tried to run the start-trainer script, but I'm getting an error:

Traceback (most recent call last):
  File "/Users/ethanbass/miniforge3/envs/rootpainter/bin/start-trainer", line 8, in <module>
    sys.exit(start())
  File "/Users/ethanbass/miniforge3/envs/rootpainter/lib/python3.9/site-packages/root_painter_trainer/__init__.py", line 30, in start
    from trainer import Trainer
  File "/Users/ethanbass/miniforge3/envs/rootpainter/lib/python3.9/site-packages/root_painter_trainer/trainer.py", line 241
    def stop_fn()
                 ^
SyntaxError: invalid syntax

I'm not very good at python so I'm not sure what's causing the syntax error -- is there a missing colon after the function definition? (I also get the same error if I import the start function and try to run it from python). Would appreciate any tips you can offer! Thanks for all your great work on this software -- we've been finding it very useful in our research.

Ethan

Edit: I added the colon after def stop_fn() and it works now, so I guess this was the issue...

@ethanbass
Copy link

I'm afraid I'm getting a different error now after I tried to start training with the local server. I am using the latest release of the client from the main branch (v 0.2.27). The new error reads:

Traceback (most recent call last):
  File "/Users/ethanbass/miniforge3/envs/rootpainter/bin/start-trainer", line 8, in <module>
    sys.exit(start())
  File "/Users/ethanbass/miniforge3/envs/rootpainter/lib/python3.9/site-packages/root_painter_trainer/__init__.py", line 36, in start
    trainer.main_loop()
  File "/Users/ethanbass/miniforge3/envs/rootpainter/lib/python3.9/site-packages/root_painter_trainer/trainer.py", line 125, in main_loop
    self.train_one_epoch()
  File "/Users/ethanbass/miniforge3/envs/rootpainter/lib/python3.9/site-packages/root_painter_trainer/trainer.py", line 247, in train_one_epoch
    train_result = model_utils.epoch(self.model, train_annot_dir, val_annot_dir,
TypeError: epoch() takes 6 positional arguments but 11 were given

@Abe404
Copy link
Owner

Abe404 commented Jul 18, 2023

Hi Ethan,

Thanks for the kind words.

Thanks for reporting these issues. This branch is a little less stable compared to the master branch, but I'm happy to fix that.

I'll work on these issues today and will update you again soon.

Kind regards,
Abraham

@Abe404
Copy link
Owner

Abe404 commented Jul 18, 2023

Syntax error fixed:
0e6e5aa

My bad. I should have ran the tests/linter before committing this error.

Abe404 added a commit that referenced this issue Jul 18, 2023
@Abe404
Copy link
Owner

Abe404 commented Jul 18, 2023

@ethanbass

I'm afraid I'm getting a different error now after I tried to start training with the local server. I am using the latest release of the client from the main branch (v 0.2.27). The new error reads:

Traceback (most recent call last):
  File "/Users/ethanbass/miniforge3/envs/rootpainter/bin/start-trainer", line 8, in <module>
    sys.exit(start())
  File "/Users/ethanbass/miniforge3/envs/rootpainter/lib/python3.9/site-packages/root_painter_trainer/__init__.py", line 36, in start
    trainer.main_loop()
  File "/Users/ethanbass/miniforge3/envs/rootpainter/lib/python3.9/site-packages/root_painter_trainer/trainer.py", line 125, in main_loop
    self.train_one_epoch()
  File "/Users/ethanbass/miniforge3/envs/rootpainter/lib/python3.9/site-packages/root_painter_trainer/trainer.py", line 247, in train_one_epoch
    train_result = model_utils.epoch(self.model, train_annot_dir, val_annot_dir,
TypeError: epoch() takes 6 positional arguments but 11 were given

I believe this is now fixed. (fix in 182460a). Can you please pull the latest code and let me know how it goes and if you run into any more problems. This branch is still a bit 'work in progress' as I have some more tests I'd like to do before merging it back into the master branch so it is especially useful for me to hear how your testing goes and if you run into any issues.

Thanks again for your support and for reaching out with the detailed error message.

Kind regards,
Abraham

@ethanbass
Copy link

Thank you very much for looking into this! I am getting a different error now after reinstalling with your new commits:

Traceback (most recent call last):
  File "/Users/ethanbass/miniforge3/envs/rootpainter/bin/start-trainer", line 8, in <module>
    sys.exit(start())
  File "/Users/ethanbass/miniforge3/envs/rootpainter/lib/python3.9/site-packages/root_painter_trainer/__init__.py", line 30, in start
    from trainer import Trainer
  File "/Users/ethanbass/miniforge3/envs/rootpainter/lib/python3.9/site-packages/root_painter_trainer/trainer.py", line 39, in <module>
    from multi_epoch.multi_epoch_loader import MultiEpochsDataLoader
ModuleNotFoundError: No module named 'multi_epoch'

Not sure where the multi_epoch module is supposed to come from?

@Abe404
Copy link
Owner

Abe404 commented Jul 18, 2023

The multi_epoch module is found here:
https://github.com/Abe404/root_painter/tree/86-installing-root_trainer-on-an-m1-max-apple-silicon/trainer/multi_epoch

I think the missing multi_module might be an issue related to the method of installation and running using start-trainer.
I should fix this when I make the official release with pip (once this branch gets merged into master) but for now could you try installing and running by doing the following:

# download source code for specific mac branch (work in progress)
git clone -b 86-installing-root_trainer-on-an-m1-max-apple-silicon --single-branch https://github.com/Abe404/root_painter.git

# Create an environment for the trainer (to isolate dependencies)
python3 -m venv root_painter/trainer/env

# Activate environment 
source root_painter/trainer/env/bin/activate

# Install environment dependencies for trainer
pip install -r root_painter/trainer/requirements.txt

# Start trainer
python3 root_painter/trainer/main.py

@ethanbass
Copy link

Thanks. I think you're right that it was an installation issue. I was using pip to try to clone the m1 branch from github but it seems like it wasn't installing the multi_epoch folder? It seems to be working now after following your instructions. Thank you very much! I will update you if I run into any more issues. (Also FIY, I can already see that it's running much much faster from the local server on my computer compared to google colab).

@Abe404
Copy link
Owner

Abe404 commented Jul 18, 2023

seems like it wasn't installing the multi_epoch folder?

Seems like it. I will try to figure out why when I start working on the next pip release.

It seems to be working now after following your instructions

Ace!

I can already see that it's running much much faster from the local server on my computer compared to google colab.

Yes, I think think this is a big deal. I have had a similar experience. My plan is to create a version of RootPainter for Mac where the server is integrated into the client. Basically a stand-alone integrated version which does not require any command line usage. The first step is to get this version released (that still requires some command line usage to install the server with pip etc) and then I will try to create some kind of UI widget to allow the trainer to be managed from within the client.

@ethanbass
Copy link

ethanbass commented Jul 18, 2023

The integrated version sounds great. Would definitely make the barrier to entry a lot lower for people who don't want to fiddle around with the command line. Thanks again for all your help!!

@sporring
Copy link

I had trouble installing root-painter-trainer on a macbook M1 running Macos 15.1. The imagecodecs seemed to be the root of it all, possibly in combination with the python version. Here is what I did to install the server:

conda create -n rootPainter python=3.10
conda activate rootPainter
conda install imagecodecs==2021.8.26
pip install root-painter-trainer
start-trainer

at which point I get the Please specify RootPainter sync directory prompt.

@Abe404
Copy link
Owner

Abe404 commented Nov 18, 2024

Hi Jon,

The current instructions for getting the RootPainter trainer installed and running on Mac are the following:
#86 (comment)

They seemed to work well for myself and @ethanbass so that's my recommendation for now until I can get these changes into the main release.

Kind regards,
Abraham

@sporring
Copy link

Unfortunately, that does not work. It seems to rely on distutils, which since Python 3.12 is no longer included [https://stackoverflow.com/questions/77247893/modulenotfounderror-no-module-named-distutils-in-python-3-12]. Installing it via setuptools [https://pypi.org/project/setuptools/] gives new problems:

(env) (base) jrh630@SCI1012515 ~ % pip install -r root_painter/trainer/requirements.txt
Collecting scikit-image==0.21.0 (from -r root_painter/trainer/requirements.txt (line 1))
  Using cached scikit_image-0.21.0.tar.gz (22.7 MB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Installing backend dependencies ... done
  Preparing metadata (pyproject.toml) ... done
Collecting numpy==1.24.3 (from -r root_painter/trainer/requirements.txt (line 2))
  Using cached numpy-1.24.3.tar.gz (10.9 MB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  error: subprocess-exited-with-error
  
  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [33 lines of output]
      Traceback (most recent call last):
        File "/Users/jrh630/root_painter/trainer/env/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
          main()
        File "/Users/jrh630/root_painter/trainer/env/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/Users/jrh630/root_painter/trainer/env/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 112, in get_requires_for_build_wheel
          backend = _build_backend()
                    ^^^^^^^^^^^^^^^^
        File "/Users/jrh630/root_painter/trainer/env/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 77, in _build_backend
          obj = import_module(mod_path)
                ^^^^^^^^^^^^^^^^^^^^^^^
        File "/opt/anaconda3/lib/python3.12/importlib/__init__.py", line 90, in import_module
          return _bootstrap._gcd_import(name[level:], package, level)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
        File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
        File "<frozen importlib._bootstrap>", line 1310, in _find_and_load_unlocked
        File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
        File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
        File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
        File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
        File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
        File "<frozen importlib._bootstrap_external>", line 995, in exec_module
        File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
        File "/private/var/folders/yz/sn2qc9qs7hv31hxnjc_4v9rr0000gn/T/pip-build-env-7ijch8m0/overlay/lib/python3.12/site-packages/setuptools/__init__.py", line 16, in <module>
          import setuptools.version
        File "/private/var/folders/yz/sn2qc9qs7hv31hxnjc_4v9rr0000gn/T/pip-build-env-7ijch8m0/overlay/lib/python3.12/site-packages/setuptools/version.py", line 1, in <module>
          import pkg_resources
        File "/private/var/folders/yz/sn2qc9qs7hv31hxnjc_4v9rr0000gn/T/pip-build-env-7ijch8m0/overlay/lib/python3.12/site-packages/pkg_resources/__init__.py", line 2172, in <module>
          register_finder(pkgutil.ImpImporter, find_on_path)
                          ^^^^^^^^^^^^^^^^^^^
      AttributeError: module 'pkgutil' has no attribute 'ImpImporter'. Did you mean: 'zipimporter'?
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

ChatGPT thinks that this is also a Python version problem. Unfortunately, while I'm not familiar with venv but it seems resistant to making environments with different versions of python.

@Abe404
Copy link
Owner

Abe404 commented Nov 18, 2024

Thanks for sharing these details. I will install python3.13 on my Mac and see if I can reproduce the problem (and then fix it).

@Abe404
Copy link
Owner

Abe404 commented Nov 30, 2024

fbb2846 is a commit that includes the fixes. Unfortunately this Mac (local training) version is still not as well tested and I am not yet fully confident in it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

5 participants