Sagemaker Object2Vec training throughput #2372
Unanswered
adityagupta970
asked this question in
Help
Replies: 2 comments
-
I've passed this along to the team that owns Object2Vec (reference: P38158350). Thanks for using SageMaker! |
Beta Was this translation helpful? Give feedback.
0 replies
-
I'm having this same issue, wanted to know if there is any other ticket/forum where this is being tracked? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am using Sagemaker Object2Vec to train on data of size 2GB.
ml.p2.xlarge instance took 12 hours to train the data on 4 epochs going at the speed of 5000 samples/sec.
Now, I am using a higher level instance ml.p2.16xlarge and it only trains at 400 samples/sec with this in the logs
It is expected that ml.p2.16xlarge would train faster.
This is what I see in the logs
only 114 out of 240 GPU pairs are enabled direct access. It may affect the performance. You can set MXNET_ENABLE_GPU_P2P=0 to turn it off
System information
Beta Was this translation helpful? Give feedback.
All reactions