Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] [GHA] HuggingFace cache #28481

Open
wants to merge 24 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
52424ae
use staging runner
akashchi Jan 16, 2025
82a03dd
start hf tests
akashchi Jan 16, 2025
0b31d9f
add missing staging
akashchi Jan 16, 2025
51235e9
Merge branch 'master' into ci/gha/hf-cache-test
akashchi Jan 17, 2025
1a125b3
Merge remote-tracking branch 'upstream/master' into ci/gha/hf-cache-test
akashchi Jan 22, 2025
2e1f773
check share on Win
akashchi Jan 24, 2025
0c4d7c0
Merge remote-tracking branch 'upstream/master' into ci/gha/hf-cache-test
akashchi Jan 24, 2025
4d0d0a7
Merge branch 'ci/gha/hf-cache-test' of github.com:akashchi/openvino i…
akashchi Jan 24, 2025
eb4b951
fix runner name
akashchi Jan 24, 2025
c90de13
set both HF cache vars to the same dir
akashchi Jan 27, 2025
b4b993f
Merge remote-tracking branch 'upstream/master' into ci/gha/hf-cache-test
akashchi Jan 27, 2025
92823ad
update paths
akashchi Jan 27, 2025
1c9f2a6
Merge remote-tracking branch 'upstream/master' into ci/gha/hf-cache-test
akashchi Jan 28, 2025
2156899
check share
akashchi Jan 28, 2025
fee163c
rm staging
akashchi Jan 28, 2025
f1c383c
add HF share to jobs
akashchi Jan 30, 2025
aab40fd
Merge remote-tracking branch 'upstream/master' into ci/gha/hf-cache-test
akashchi Jan 30, 2025
42d3572
set system_cache var per job
akashchi Jan 31, 2025
cccdb79
Merge remote-tracking branch 'upstream/master' into ci/gha/hf-cache-test
akashchi Jan 31, 2025
824d35f
add comment
akashchi Feb 4, 2025
358c68c
Merge remote-tracking branch 'upstream/master' into ci/gha/hf-cache-test
akashchi Feb 4, 2025
4fc43e7
setup cache dir only for self-hosted runners
akashchi Feb 4, 2025
3287f5d
fix import
akashchi Feb 5, 2025
4a864e8
Merge remote-tracking branch 'upstream/master' into ci/gha/hf-cache-test
akashchi Feb 5, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 8 additions & 3 deletions .github/workflows/job_jax_layer_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ jobs:
INSTALL_TEST_DIR: ${{ github.workspace }}/install/tests
INSTALL_WHEELS_DIR: ${{ github.workspace }}/install/wheels
LAYER_TESTS_INSTALL_DIR: ${{ github.workspace }}/install/tests/layer_tests
USE_SYSTEM_CACHE: False # Using remote HuggingFace cache
steps:
- name: Download OpenVINO artifacts (tarballs)
uses: actions/download-artifact@fa0a91b85d4f404e444e00e005971372dc801d16 # v4.1.8
Expand All @@ -67,7 +68,12 @@ jobs:
echo "INSTALL_TEST_DIR=$GITHUB_WORKSPACE/install/tests" >> "$GITHUB_ENV"
echo "INSTALL_WHEELS_DIR=$GITHUB_WORKSPACE/install/wheels" >> "$GITHUB_ENV"
echo "LAYER_TESTS_INSTALL_DIR=$GITHUB_WORKSPACE/install/tests/layer_tests" >> "$GITHUB_ENV"

echo "HF_HUB_CACHE=/mount/caches/huggingface" >> "$GITHUB_ENV"

- name: Setup HuggingFace Cache Directory (Windows)
if: runner.os == 'Windows'
run: Add-Content -Path $env:GITHUB_ENV -Value "HF_HUB_CACHE=C:\\mount\\caches\\huggingface"

- name: Install OpenVINO dependencies (mac)
if: runner.os == 'macOS'
run: brew install pigz
Expand All @@ -80,8 +86,7 @@ jobs:

- name: Extract OpenVINO artifacts (Windows)
if: runner.os == 'Windows'
run: |
Expand-Archive openvino_tests.zip -DestinationPath ${{ env.INSTALL_DIR }}
run: Expand-Archive openvino_tests.zip -DestinationPath ${{ env.INSTALL_DIR }}
working-directory: ${{ env.INSTALL_DIR }}

- name: Fetch setup_python and install wheels actions
Expand Down
8 changes: 7 additions & 1 deletion .github/workflows/job_jax_models_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ jobs:
INSTALL_TEST_DIR: ${{ github.workspace }}/install/tests
INSTALL_WHEELS_DIR: ${{ github.workspace }}/install/wheels
MODEL_HUB_TESTS_INSTALL_DIR: ${{ github.workspace }}/install/tests/model_hub_tests
USE_SYSTEM_CACHE: False # Using remote HuggingFace cache
steps:
- name: Download OpenVINO artifacts (tarballs)
uses: actions/download-artifact@fa0a91b85d4f404e444e00e005971372dc801d16 # v4.1.8
Expand All @@ -57,7 +58,12 @@ jobs:
echo "INSTALL_DIR=$GITHUB_WORKSPACE/install" >> "$GITHUB_ENV"
echo "INSTALL_TEST_DIR=$GITHUB_WORKSPACE/install/tests" >> "$GITHUB_ENV"
echo "MODEL_HUB_TESTS_INSTALL_DIR=$GITHUB_WORKSPACE/install/tests/model_hub_tests" >> "$GITHUB_ENV"

echo "HF_HUB_CACHE=/mount/caches/huggingface" >> "$GITHUB_ENV"

- name: Setup HuggingFace Cache Directory (Windows)
if: runner.os == 'Windows'
run: Add-Content -Path $env:GITHUB_ENV -Value "HF_HUB_CACHE=C:\\mount\\caches\\huggingface"

- name: Extract OpenVINO packages and tests
run: |
pigz -dc openvino_tests.tar.gz | tar -xf - -C ${INSTALL_DIR}
Expand Down
7 changes: 6 additions & 1 deletion .github/workflows/job_onnx_models_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ jobs:
# depending on GITHUB_RUN_NUMBER variable
NUMBER_OF_REPLICAS: 2
ONNX_MODEL_ZOO_SHA: "5faef4c33eba0395177850e1e31c4a6a9e634c82"
USE_SYSTEM_CACHE: False # Using remote HuggingFace cache
if: ${{ github.event_name != 'merge_group' }}
steps:
- name: Download OpenVINO artifacts (tests)
Expand All @@ -59,8 +60,12 @@ jobs:
echo "INSTALL_DIR=$GITHUB_WORKSPACE/install" >> "$GITHUB_ENV"
echo "INSTALL_TEST_DIR=$GITHUB_WORKSPACE/install/tests" >> "$GITHUB_ENV"
echo "MODELS_SHARE_PATH=/mount/testdata$((GITHUB_RUN_NUMBER % NUMBER_OF_REPLICAS))" >> "$GITHUB_ENV"
echo $MODELS_SHARE_PATH
echo "LOGS_FOLDER=$GITHUB_WORKSPACE/onnx_models_tests_logs" >> "$GITHUB_ENV"
echo "HF_HUB_CACHE=/mount/caches/huggingface" >> "$GITHUB_ENV"

- name: Setup HuggingFace Cache Directory (Windows)
if: runner.os == 'Windows'
run: Add-Content -Path $env:GITHUB_ENV -Value "HF_HUB_CACHE=C:\\mount\\caches\\huggingface"

- name: Extract OpenVINO packages and tests
run: |
Expand Down
6 changes: 6 additions & 0 deletions .github/workflows/job_pytorch_layer_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ jobs:
INSTALL_TEST_DIR: ${{ github.workspace }}/install/tests
INSTALL_WHEELS_DIR: ${{ github.workspace }}/install/wheels
LAYER_TESTS_INSTALL_DIR: ${{ github.workspace }}/install/tests/layer_tests
USE_SYSTEM_CACHE: False # Using remote HuggingFace cache
steps:
- name: Download OpenVINO artifacts (tarballs)
uses: actions/download-artifact@fa0a91b85d4f404e444e00e005971372dc801d16 # v4.1.8
Expand All @@ -67,6 +68,11 @@ jobs:
echo "INSTALL_TEST_DIR=$GITHUB_WORKSPACE/install/tests" >> "$GITHUB_ENV"
echo "INSTALL_WHEELS_DIR=$GITHUB_WORKSPACE/install/wheels" >> "$GITHUB_ENV"
echo "LAYER_TESTS_INSTALL_DIR=$GITHUB_WORKSPACE/install/tests/layer_tests" >> "$GITHUB_ENV"
echo "HF_HUB_CACHE=/mount/caches/huggingface" >> "$GITHUB_ENV"

- name: Setup HuggingFace Cache Directory (Windows)
if: runner.os == 'Windows'
run: Add-Content -Path $env:GITHUB_ENV -Value "HF_HUB_CACHE=C:\\mount\\caches\\huggingface"

- name: Install OpenVINO dependencies (mac)
if: runner.os == 'macOS'
Expand Down
12 changes: 7 additions & 5 deletions .github/workflows/job_pytorch_models_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ jobs:
INSTALL_TEST_DIR: ${{ github.workspace }}/install/tests
INSTALL_WHEELS_DIR: ${{ github.workspace }}/install/wheels
MODEL_HUB_TESTS_INSTALL_DIR: ${{ github.workspace }}/install/tests/model_hub_tests
USE_SYSTEM_CACHE: False # Using remote HuggingFace cache
steps:
- name: Check sudo
if: ${{ runner.os == 'Linux' }}
Expand Down Expand Up @@ -65,11 +66,17 @@ jobs:

# Needed as ${{ github.workspace }} is not working correctly when using Docker
- name: Setup Variables
if: ${{ contains(inputs.runner, 'aks') }} # Do not setup variables for GitHub-hosted runners
run: |
echo "OPENVINO_REPO=$GITHUB_WORKSPACE/openvino" >> "$GITHUB_ENV"
echo "INSTALL_DIR=$GITHUB_WORKSPACE/install" >> "$GITHUB_ENV"
echo "INSTALL_TEST_DIR=$GITHUB_WORKSPACE/install/tests" >> "$GITHUB_ENV"
echo "MODEL_HUB_TESTS_INSTALL_DIR=$GITHUB_WORKSPACE/install/tests/model_hub_tests" >> "$GITHUB_ENV"
echo "HF_HUB_CACHE=/mount/caches/huggingface" >> "$GITHUB_ENV"

- name: Setup HuggingFace Cache Directory (Windows)
if: runner.os == 'Windows'
run: Add-Content -Path $env:GITHUB_ENV -Value "HF_HUB_CACHE=C:\\mount\\caches\\huggingface"

- name: Extract OpenVINO artifacts
run: |
Expand Down Expand Up @@ -130,7 +137,6 @@ jobs:
env:
TYPE: ${{ inputs.model_scope == 'precommit' && 'precommit' || 'nightly' }}
TEST_DEVICE: CPU
USE_SYSTEM_CACHE: False
OP_REPORT_FILE: ${{ env.INSTALL_TEST_DIR }}/TEST-torch_unsupported_ops.log

- name: PagedAttention Test
Expand All @@ -140,7 +146,6 @@ jobs:
python3 -m pytest ${MODEL_HUB_TESTS_INSTALL_DIR}/transformation_tests/test_pa_transformation.py -m precommit --html=${INSTALL_TEST_DIR}/TEST-torch_pagedattention_tests.html --self-contained-html -vvv -s --tb=short -n 2
env:
TEST_DEVICE: CPU
USE_SYSTEM_CACHE: False

- name: RoPE Test
if: ${{ inputs.model_scope == 'precommit' }}
Expand All @@ -149,7 +154,6 @@ jobs:
python3 -m pytest ${MODEL_HUB_TESTS_INSTALL_DIR}/transformation_tests/test_transformations.py -m precommit --html=${INSTALL_TEST_DIR}/TEST-torch_rope_tests.html --self-contained-html -v --tb=short -n 2
env:
TEST_DEVICE: CPU
USE_SYSTEM_CACHE: False

- name: StatefulToStateless Test
if: ${{ inputs.model_scope == 'precommit' }}
Expand All @@ -158,7 +162,6 @@ jobs:
python3 -m pytest ${MODEL_HUB_TESTS_INSTALL_DIR}/transformation_tests/test_stateful_to_stateless_transformation.py -m precommit --html=${INSTALL_TEST_DIR}/TEST-torch_stateful_to_stateless_tests.html --self-contained-html -v --tb=short
env:
TEST_DEVICE: CPU
USE_SYSTEM_CACHE: False

- name: TorchFX GPTQ Pattern Test
if: ${{ inputs.model_scope == 'precommit' }}
Expand All @@ -169,7 +172,6 @@ jobs:
python3 -m pytest ${MODEL_HUB_TESTS_INSTALL_DIR}/transformation_tests/test_gptq_torchfx_transformations.py -m precommit --html=${INSTALL_TEST_DIR}/TEST-torch_gptqpattern_tests.html --self-contained-html -v --tb=short
env:
TEST_DEVICE: CPU
USE_SYSTEM_CACHE: False

- name: Reformat unsupported ops file
if: ${{ inputs.model_scope != 'precommit' && !cancelled()}}
Expand Down
6 changes: 6 additions & 0 deletions .github/workflows/job_tensorflow_layer_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ jobs:
INSTALL_TEST_DIR: ${{ github.workspace }}/install/tests
INSTALL_WHEELS_DIR: ${{ github.workspace }}/install/wheels
LAYER_TESTS_INSTALL_DIR: ${{ github.workspace }}/install/tests/layer_tests
USE_SYSTEM_CACHE: False # Using remote HuggingFace cache
steps:
- name: Download OpenVINO artifacts (tarballs)
uses: actions/download-artifact@fa0a91b85d4f404e444e00e005971372dc801d16 # v4.1.8
Expand All @@ -67,6 +68,11 @@ jobs:
echo "INSTALL_TEST_DIR=$GITHUB_WORKSPACE/install/tests" >> "$GITHUB_ENV"
echo "LAYER_TESTS_INSTALL_DIR=$GITHUB_WORKSPACE/install/tests/layer_tests" >> "$GITHUB_ENV"
echo "INSTALL_WHEELS_DIR=$GITHUB_WORKSPACE/install/wheels" >> "$GITHUB_ENV"
echo "HF_HUB_CACHE=/mount/caches/huggingface" >> "$GITHUB_ENV"

- name: Setup HuggingFace Cache Directory (Windows)
if: runner.os == 'Windows'
run: Add-Content -Path $env:GITHUB_ENV -Value "HF_HUB_CACHE=C:\\mount\\caches\\huggingface"

- name: Install OpenVINO dependencies (mac)
if: runner.os == 'macOS'
Expand Down
12 changes: 7 additions & 5 deletions .github/workflows/job_tensorflow_models_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ jobs:
INSTALL_WHEELS_DIR: ${{ github.workspace }}/install/wheels
MODEL_HUB_TESTS_INSTALL_DIR: ${{ github.workspace }}/install/tests/model_hub_tests
NUMBER_OF_REPLICAS: 2
USE_SYSTEM_CACHE: False # Using remote HuggingFace cache
steps:
- name: Download OpenVINO artifacts (tarballs)
uses: actions/download-artifact@fa0a91b85d4f404e444e00e005971372dc801d16 # v4.1.8
Expand All @@ -59,13 +60,14 @@ jobs:
echo "INSTALL_TEST_DIR=$GITHUB_WORKSPACE/install/tests" >> "$GITHUB_ENV"
echo "MODEL_HUB_TESTS_INSTALL_DIR=$GITHUB_WORKSPACE/install/tests/model_hub_tests" >> "$GITHUB_ENV"
echo "TFHUB_CACHE_DIR=/mount/testdata$((GITHUB_RUN_NUMBER % NUMBER_OF_REPLICAS))/tfhub_models" >> "$GITHUB_ENV"
echo $TFHUB_CACHE_DIR
echo "HF_HUB_CACHE=/mount/testdata$((GITHUB_RUN_NUMBER % NUMBER_OF_REPLICAS))/hugging_face" >> "$GITHUB_ENV"
echo $HF_HUB_CACHE
echo "HF_HUB_CACHE=/mount/caches/huggingface" >> "$GITHUB_ENV"

- name: Setup HuggingFace Cache Directory (Windows)
if: runner.os == 'Windows'
run: Add-Content -Path $env:GITHUB_ENV -Value "HF_HUB_CACHE=C:\\mount\\caches\\huggingface"

- name: Extract OpenVINO artifacts (Linux and macOS)
run: |
pigz -dc openvino_tests.tar.gz | tar -xf - -C ${INSTALL_DIR}
run: pigz -dc openvino_tests.tar.gz | tar -xf - -C ${INSTALL_DIR}
working-directory: ${{ env.INSTALL_DIR }}

- name: Fetch setup_python action
Expand Down
7 changes: 4 additions & 3 deletions tests/model_hub_tests/jax/test_hf_transformers.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
import pytest
import requests
from PIL import Image
from models_hub_common.constants import hf_hub_cache_dir
from models_hub_common.constants import hf_hub_cache_dir, no_clean_cache_dir
from models_hub_common.utils import cleanup_dir, get_models_list, retry
from transformers import (
AutoProcessor,
Expand Down Expand Up @@ -42,8 +42,9 @@ def load_model(self, model_name, _):
return model

def teardown_method(self):
# remove all downloaded files from cache
cleanup_dir(hf_hub_cache_dir)
if not no_clean_cache_dir:
# remove all downloaded files from cache
cleanup_dir(hf_hub_cache_dir)
super().teardown_method()

def infer_ov_model(self, ov_model, inputs, ie_device):
Expand Down
2 changes: 1 addition & 1 deletion tests/model_hub_tests/models_hub_common/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@
os.environ['HF_HUB_CACHE'] = hf_cache_dir

no_clean_cache_dir = False
hf_hub_cache_dir = tempfile.gettempdir()
hf_hub_cache_dir = hf_cache_dir
if os.environ.get('USE_SYSTEM_CACHE', 'True') == 'False':
no_clean_cache_dir = True
os.environ['HUGGINGFACE_HUB_CACHE'] = hf_hub_cache_dir
Comment on lines 27 to 33
Copy link
Contributor Author

@akashchi akashchi Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not sure how it was supposed to work in the first place:

  • There are two env variables for HF cache: HF_HUB_CACHE & HUGGINGFACE_HUB_CACHE, the latter is deprecated but maybe needed for backwards compatibility
  • HF_HUB_CACHE is taken from the environment and if not present -> a temp directory is used instead
  • HUGGINGFACE_HUB_CACHE was always set to a created temporary directory, w/o even looking for it in the env. what if we want to use a remote cache like in CI?
  • The cleanup is controlled by another env variable USE_SYSTEM_CACHE but only for a deprecated HUGGINGFACE_HUB_CACHE

Via the changes in this PR, I set HF_HUB_CACHE as a single source of truth but I think it could and should be simplified further. Is HUGGINGFACE_HUB_CACHE even needed? I think it could be done like:

  • Get only HF_HUB_CACHE from the env:
    • if present, just use the value
    • If not present -> set it to the temp directory
  • Drop HUGGINGFACE_HUB_CACHE / set it to HF_HUB_CACHE
  • Rename USE_SYSTEM_CACHE into something like CLEAN_HF_CACHE/KEEP_HF_CACHE/...

Expand Down
7 changes: 4 additions & 3 deletions tests/model_hub_tests/pytorch/test_edsr.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
import pytest
import random
import torch
from models_hub_common.constants import hf_hub_cache_dir
from models_hub_common.constants import hf_hub_cache_dir, no_clean_cache_dir
from models_hub_common.utils import cleanup_dir

from torch_utils import TestTorchConvertModel
Expand Down Expand Up @@ -52,8 +52,9 @@ def load_model(self, model_name, model_link):
return model

def teardown_method(self):
# remove all downloaded files from cache
cleanup_dir(hf_hub_cache_dir)
if not no_clean_cache_dir:
# remove all downloaded files from cache
cleanup_dir(hf_hub_cache_dir)
super().teardown_method()

@pytest.mark.parametrize("name", ["edsr"])
Expand Down
7 changes: 4 additions & 3 deletions tests/model_hub_tests/pytorch/test_hf_transformers.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
VivitImageProcessor, XCLIPVisionModel
)

from models_hub_common.constants import hf_hub_cache_dir
from models_hub_common.constants import hf_hub_cache_dir, no_clean_cache_dir
from models_hub_common.utils import cleanup_dir, get_models_list, retry
from torch_utils import TestTorchConvertModel

Expand Down Expand Up @@ -497,8 +497,9 @@ def load_model(self, name, type):
return model

def teardown_method(self):
# remove all downloaded files from cache
cleanup_dir(hf_hub_cache_dir)
if not no_clean_cache_dir:
# remove all downloaded files from cache
cleanup_dir(hf_hub_cache_dir)

super().teardown_method()

Expand Down
Loading