Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] [GHA] HuggingFace cache #28481

Open
wants to merge 24 commits into
base: master
Choose a base branch
from

Conversation

akashchi
Copy link
Contributor

@akashchi akashchi commented Jan 16, 2025

The new HuggingFace share was added to the self-hosted runners with the path being mount/caches/huggingface. This PR adds this share to workflows that use HF data.

Tickets:

  • 159012
  • 159010

@akashchi akashchi added the WIP work in progress label Jan 16, 2025
@akashchi akashchi added this to the 2025.1 milestone Jan 16, 2025
@github-actions github-actions bot added category: CI OpenVINO public CI github_actions Pull requests that update GitHub Actions code labels Jan 16, 2025
@github-actions github-actions bot added category: TF FE OpenVINO TensorFlow FrontEnd category: PyTorch FE OpenVINO PyTorch Frontend category: JAX FE OpenVINO JAX FrontEnd labels Jan 27, 2025
@akashchi akashchi marked this pull request as ready for review February 5, 2025 12:55
@akashchi akashchi requested review from a team as code owners February 5, 2025 12:55
@akashchi akashchi requested review from mvafin and slyalin February 5, 2025 12:55
@akashchi akashchi removed the WIP work in progress label Feb 5, 2025
@akashchi akashchi requested a review from mryzhov February 5, 2025 12:55
@rkazants rkazants self-requested a review February 5, 2025 13:01
Comment on lines 27 to 33
os.environ['HF_HUB_CACHE'] = hf_cache_dir

no_clean_cache_dir = False
hf_hub_cache_dir = tempfile.gettempdir()
hf_hub_cache_dir = hf_cache_dir
if os.environ.get('USE_SYSTEM_CACHE', 'True') == 'False':
no_clean_cache_dir = True
os.environ['HUGGINGFACE_HUB_CACHE'] = hf_hub_cache_dir
Copy link
Contributor Author

@akashchi akashchi Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not sure how it was supposed to work in the first place:

  • There are two env variables for HF cache: HF_HUB_CACHE & HUGGINGFACE_HUB_CACHE, the latter is deprecated but maybe needed for backwards compatibility
  • HF_HUB_CACHE is taken from the environment and if not present -> a temp directory is used instead
  • HUGGINGFACE_HUB_CACHE was always set to a created temporary directory, w/o even looking for it in the env. what if we want to use a remote cache like in CI?
  • The cleanup is controlled by another env variable USE_SYSTEM_CACHE but only for a deprecated HUGGINGFACE_HUB_CACHE

Via the changes in this PR, I set HF_HUB_CACHE as a single source of truth but I think it could and should be simplified further. Is HUGGINGFACE_HUB_CACHE even needed? I think it could be done like:

  • Get only HF_HUB_CACHE from the env:
    • if present, just use the value
    • If not present -> set it to the temp directory
  • Drop HUGGINGFACE_HUB_CACHE / set it to HF_HUB_CACHE
  • Rename USE_SYSTEM_CACHE into something like CLEAN_HF_CACHE/KEEP_HF_CACHE/...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: CI OpenVINO public CI category: JAX FE OpenVINO JAX FrontEnd category: PyTorch FE OpenVINO PyTorch Frontend category: TF FE OpenVINO TensorFlow FrontEnd github_actions Pull requests that update GitHub Actions code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants