Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pytorch/training/gpu/2.3.1/transformers/4.48.0/py311/Dockerfile #134

Merged
merged 15 commits into from
Jan 14, 2025

Conversation

alvarobartt
Copy link
Member

@alvarobartt alvarobartt commented Dec 17, 2024

Description

This PR bumps the dependencies to release a new PyTorch DLC for training with improvements, support for newer model architectures, bug fixes and much more.

Additionally, besides the version bumps, this PR also includes the gcloud CLI and installs huggingface_hub with the hf-transfer utility for improvements on download/upload speed to the Hugging Face Hub.

Note

This PR will enable the example on how to fine-tune PaliGemma 2 with TRL to be shipped within #133

@alvarobartt alvarobartt self-assigned this Dec 17, 2024
@alvarobartt alvarobartt added pytorch Pytorch related Issues container labels Dec 17, 2024
This commit also contains some formatting improvements to better debug
the `Dockerfile` such as indentation when a command is divided in
multiple lines to know that it refers to the unindented command above;
also set bash as the default shell, and fix `gcloud` CLI installation
Bump the `transformers` dependency to 4.48.0 to support the ModernBERT
architecture, as well as bumping `diffusers` including new video and
image generation pipelines, as well as a bunch of other features,
improvements and bug fixes. Additionally, the `Dockerfile` formatting
has been fixed.
@alvarobartt alvarobartt changed the title Add pytorch/training/gpu/2.3.0/transformers/4.47.0/py311/Dockerfile Add pytorch/training/gpu/2.3.0/transformers/4.48.0/py311/Dockerfile Jan 3, 2025
@alvarobartt alvarobartt changed the title Add pytorch/training/gpu/2.3.0/transformers/4.48.0/py311/Dockerfile Add pytorch/training/gpu/2.3.0/transformers/4.47.1/py311/Dockerfile Jan 3, 2025
Copy link
Member

@philschmid philschmid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whats the time benefit of uv? Not sure if we should add another dependency for a few seconds faster build time of containers. Especially with the alias, e.g. adding custom dependencies in training jobs, what if they are not supported etc.

@alvarobartt
Copy link
Member Author

Whats the time benefit of uv? Not sure if we should add another dependency for a few seconds faster build time of containers. Especially with the alias, e.g. adding custom dependencies in training jobs, what if they are not supported etc.

So it's mainly for both the docker build time and also for installing dependencies on e.g. Kubeflow Pipelines as it will be faster, I'm expecting there's not much impact besides faster pip-installs and bigger image size; so we can roll back those changes if not desired at the current stage; whatever you feel like is better, I'm happy with either! 🤗

@alvarobartt
Copy link
Member Author

Ok @philschmid after checking, apparently kfp is automatically installing the Python dependencies as python3 -m pip install ... as per https://github.com/kubeflow/pipelines/blob/2686e017ceca21671d47a6f8d5703ad94b7f0615/sdk/python/kfp/dsl/component_factory.py#L126, meaning that the current alias won't work on those scenarios i.e. when installing the packages via a Kubeflow Component for Vertex Pipelines; the current Dockerfile can be updated to work even in that case, but as you mentioned, not sure that's worth it, WDYT?

@philschmid
Copy link
Member

Ok @philschmid after checking, apparently kfp is automatically installing the Python dependencies as python3 -m pip install ... as per https://github.com/kubeflow/pipelines/blob/2686e017ceca21671d47a6f8d5703ad94b7f0615/sdk/python/kfp/dsl/component_factory.py#L126, meaning that the current alias won't work on those scenarios i.e. when installing the packages via a Kubeflow Component for Vertex Pipelines; the current Dockerfile can be updated to work even in that case, but as you mentioned, not sure that's worth it, WDYT?

Lets remove it and maybe revisit in a few months.

@alvarobartt alvarobartt changed the title Add pytorch/training/gpu/2.3.0/transformers/4.47.1/py311/Dockerfile Add pytorch/training/gpu/2.3.q/transformers/4.47.1/py311/Dockerfile Jan 9, 2025
@alvarobartt alvarobartt changed the title Add pytorch/training/gpu/2.3.q/transformers/4.47.1/py311/Dockerfile Add pytorch/training/gpu/2.3.1/transformers/4.47.1/py311/Dockerfile Jan 9, 2025
@alvarobartt alvarobartt changed the title Add pytorch/training/gpu/2.3.1/transformers/4.47.1/py311/Dockerfile Add pytorch/training/gpu/2.3.1/transformers/4.48.0/py311/Dockerfile Jan 13, 2025
@alvarobartt alvarobartt merged commit e570d07 into main Jan 14, 2025
1 check passed
@alvarobartt alvarobartt deleted the pytorch-training-release branch January 14, 2025 10:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
container pytorch Pytorch related Issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants