-
Notifications
You must be signed in to change notification settings - Fork 630
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
distributed training only on CPUs #5787
Comments
Hi @zmasih, Can you tell me how do you run the example and what kind of error do you observe? I cannot rule out that the code itself is not adjusted to run only on the CPU, we added a CPU variant of the pipeline but not the model itself. |
I'm running |
Hi @zmasih, Indeed, the code is an example that was not prepared for more than one node in mind for the CPU. |
@JanuszL Thank you for your answer. So, you are saying that if more than one device is available, with no explicit request, when running the code, DALI will use multiple CPU cores to load, decode, and preprocess data concurrently? And as a quick recheck before starting RN50, You confirm that on my system where there is no GPU, I can use distributed training on multiple nodes with this use case. |
Yes, the only thing you can adjust is the number of CPU threads that DALI uses (
I believe that the Horovod approach should work in general with DALI on CPU however I cannot say if the examples we have will work, especially the EfficientDet which uses the native TF distributed strategy. |
I've tried But I still get the following error:
Can you please guide me? it worked for me on a system with Cuda, even for --pipeline dali_cpu. |
Hi @zmasih, As I mentioned, the example is not prepared to run without the GPU even if the pipeline can run on the CPU. In this case, each DALI pipeline is assigned to a device (GPU) based on the TF distributed strategy. In this case, once the device id is provided, DALI tries to initialize CUDA. What you can do is to check if providing |
Thank you @JanuszL |
Hi @zmasih, You can start with this toy example:
to run DALI on the CPU (tested inside the jupyter notebook). |
Describe the question.
Hello.
I need to use DALI for distributed training only on CPUs. The system where I'm running my benchmark does not have any GPUs. I've tried 'EfficientDet' of the DALI repo, but it works with distributed strategies either on GPUs or on a single CPU. Would you guide me?
Check for duplicates
The text was updated successfully, but these errors were encountered: