You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For now, running Data-Juicer on multiple nodes in "ray" mode, which uses map_batches to process datasets, might cause some implicit problems.
The map_batches method has two arguments, num_gpus and concurrency, which are actually cluster-level arguments. However, they are calculated automatically according to the hardware information of a single machine. So, there might be some resource utilization problems when running on multiple nodes for OPs with _accelerator is "cuda".
The text was updated successfully, but these errors were encountered:
For now, running Data-Juicer on multiple nodes in "ray" mode, which uses
map_batches
to process datasets, might cause some implicit problems.The
map_batches
method has two arguments,num_gpus
andconcurrency
, which are actually cluster-level arguments. However, they are calculated automatically according to the hardware information of a single machine. So, there might be some resource utilization problems when running on multiple nodes for OPs with_accelerator
is "cuda".The text was updated successfully, but these errors were encountered: