Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: Use lockfiles to control cached venv #1238

Merged
merged 17 commits into from
Feb 3, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 8 additions & 1 deletion .github/dependabot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,13 @@ updates:
- package-ecosystem: "pip"
directory: "/"
schedule:
interval: "monthly"
interval: "weekly"
commit-message:
prefix: "chore(dependencies): PIP"
groups:
test-dependencies:
patterns:
- "*"

# It uses `skore/ci/requirements/python*/.python-version` to specify the python
# version used to update `skore/ci/requirements/**/test-requirements.txt`
48 changes: 31 additions & 17 deletions .github/workflows/backend.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,27 @@ jobs:
working-directory: skore/
run: pre-commit run --all-files ruff

backend-lockfiles:
runs-on: "ubuntu-latest"
if: ${{ github.event_name == 'pull_request' }}
steps:
- name: Checkout code
uses: actions/checkout@v4
with:
fetch-depth: 2

- name: Check lockfiles are not obsolete
run: |
changes=$(git diff --name-only HEAD^1 HEAD)

if
(echo "${changes}" | grep -qE 'skore/(test-)?requirements.in') &&
(echo "${changes}" | (! grep -qE "skore/ci/requirements/.*/test-requirements.txt"))
then
echo '::error title=backend-lockfiles:: Lockfiles obsolete, please execute `$ cd skore/ci; bash pip-compile.sh`'
exit 1
fi

backend-test:
strategy:
fail-fast: false
Expand Down Expand Up @@ -77,7 +98,11 @@ jobs:
id: cache-python-venv
with:
path: 'skore/venv'
key: python-venv-${{ matrix.os }}-${{ matrix.python }}-${{ matrix.scikit-learn }}-${{ hashFiles('skore/pyproject.toml') }}
key: >-
python-venv
-${{ matrix.os }}
-${{ matrix.python }}
-${{ hashFiles(format('skore/ci/requirements/python-{0}/scikit-learn-{1}/test-requirements.txt', matrix.python, matrix.scikit-learn)) }}

- name: Setup python-venv
working-directory: "skore/"
Expand All @@ -97,20 +122,11 @@ jobs:
fi

- name: Install dependencies in python-venv
working-directory: "skore/"
working-directory: ${{ format('skore/ci/requirements/python-{0}/scikit-learn-{1}', matrix.python, matrix.scikit-learn) }}
if: steps.cache-python-venv.outputs.cache-hit != 'true'
run: |
python -m pip install --upgrade "pip"
python -m pip install --upgrade "build"
# adding `.*` to the version ensures that we install the latest version of
# scikit-learn that is compatible with the specified version
python -m pip install --upgrade "scikit-learn ==${{ matrix.scikit-learn }}.*"

# Install `skore` and its dependencies
python -m pip install --upgrade --upgrade-strategy=eager ".[test]"

# Uninstall the `skore` package itself
python -m pip uninstall -y "skore"
python -m pip install --upgrade pip build
python -m pip install --requirement test-requirements.txt

- name: Save python-venv
uses: actions/cache/save@v4
Expand All @@ -124,10 +140,8 @@ jobs:
run: python -m build

- name: Install
working-directory: skore/
run: |
# Install `skore` without its dependencies, which are present in the venv
wheel=(dist/*.whl); python -m pip install --force-reinstall --no-deps "${wheel}"
working-directory: skore/dist/
run: wheel=(*.whl); python -m pip install --force-reinstall --no-deps "${wheel}"

- name: Show dependencies versions
working-directory: skore/
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -191,6 +191,7 @@ examples/plot_*.png

# Include excluded directories from github-actions
!/.github/**/build/
!/skore/ci/**/.python-version

# Exclude hatch artifacts
skore/LICENSE
48 changes: 48 additions & 0 deletions docs/design/0004-cache-python-dependencies-in-ci.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
---
date: 2025-01-09
decision-makers: ["@thomass-dev"]
---

# Cache python dependencies in CI

## Context and Problem Statement

The choice was made to support the 3 latests versions of `scikit-learn`, jointly with
the OS team.

To date, we already have a CI running `skore` tests with 2 OS, and 4 python versions.
The CI therefore runs the same job 2*4=8 times. To limit the installation time each time
we run the test, we use a native GH mechanism to persist the pip cache. That way, python
packages can be installed without the need to be re-download from PyPI. It's better than
nothing, but installation time is incompressibly slow.

Supporting 3 versions of `scikit-learn` means adding at least 2 additional test runs:
- the latest version is test on each OS/python versions (no additonal test run),
- the 2 older versions of scikit-learn are tested on ubuntu-latest python 3.12 (2
additional test runs)

The CI is becoming very long, we need to find a way to reduce its duration.

## Decision Outcome

Speed-up tests by installing python dependencies in virtual environments and caching it:

* pros: reduction of one minute per job when the cache is already built, otherwise
equivalent.
* cons: cache must be purged manually to install a new version of a dependency. It's not
a big deal for now, since the gain is higher than the constraint.

Dependencies no longer need to be installed at each step, as they are already present in
the cached venv.

We base the construction of the cache on the n-tuple OS, python, scikit-learn, and the
hash of the `pyproject.toml` file. This way, if the dependencies list changes, a new
cache is automatically built.

Each test run knows which cache to use, depending on its context.

Unused cache is automatically purged by GH after 90 days.

## More Information

Implementation in pull-request [#916](https://github.com/probabl-ai/skore/pull/916).
35 changes: 35 additions & 0 deletions docs/design/0005-use-python-lockfiles-in-ci.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
---
date: 2025-01-31
decision-makers: ["@thomass-dev"]
---

# Use python `test-requirements.txt` lockfiles in CI

## Context and Problem Statement

With the pull-request [#916](https://github.com/probabl-ai/skore/pull/916), we cache now
the python dependencies to speed-up CI. However, to update in the cache the version of a
dependency, for example, to take advantage of a bugfix, we have to manually purge the GH
cache from the GH cli. It's not scallable.

Moreover, in state of the CI, we can't control what version of a dependency is used. It
can lead to inconsistent tests, in particular when between two runs, a new version of a
dependency is released.

We therefore want to define the dependencies versions in the CI, while leaving the user
free to install what he wants.

## Decision Outcome

We can't fix dependencies version in the `pyproject.toml` file without impacting users.
Instead, we fix the versions in separate requirement files, intended to be used only by
the CI. These files must be rebuilt after each change in the dependencies list, to stay
in sync at all times with the `pyproject.toml` file.

These files are now used to construct the python cache in the CI.
These files are automatically managed by `dependabot`, to take account of new weekly
bugfixes. It produces pull-requests that must be accepted by maintainers.

## More Information

Implementation in pull-request [#1238](https://github.com/probabl-ai/skore/pull/1238).
71 changes: 71 additions & 0 deletions skore/ci/pip-compile.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
#!/bin/bash
#
# This script compiles all the `test-requirements.txt` files, based on combinations of
# `python` and `scikit-learn` versions. These combinations mirror those defined in the
# GitHub `backend` workflow.
#
# You can pass any piptools parameter:
#
# $ bash pip-compile.sh --upgrade
#

CWD="$PWD"
TMPDIR=$(mktemp -d)

# Make sure that `TMPDIR` is removed on exit, whatever the signal
trap 'rm -rf ${TMPDIR}' 0

# Declare the combinations of `python` and `scikit-learn` versions
declare -a COMBINATIONS

COMBINATIONS[0]='3.9;1.6'
COMBINATIONS[1]='3.10;1.6'
COMBINATIONS[2]='3.11;1.6'
COMBINATIONS[3]='3.12;1.4'
COMBINATIONS[4]='3.12;1.5'
COMBINATIONS[5]='3.12;1.6'

set -eu

(
# Copy everything necessary to compile requirements in `TMPDIR`
cp -r .. "${TMPDIR}/skore"
cp ../../LICENSE "${TMPDIR}/LICENSE"
cp ../../README.md "${TMPDIR}/README.md"

# Move to `TMPDIR` to avoid absolute paths in requirements file
cd "${TMPDIR}"

for combination in "${COMBINATIONS[@]}"
do
IFS=";" read -r -a combination <<< "${combination}"

python="${combination[0]}"
scikit_learn="${combination[1]}"
filepath="${CWD}/requirements/python-${python}/scikit-learn-${scikit_learn}/test-requirements.txt"

echo "Generating requirements: python ==${python} | scikit-learn ==${scikit_learn}"

# Force the `python` version by creating the appropriate virtual environment
pyenv local "${python}"
python -m venv "python-${python}"
source "python-${python}/bin/activate"

# Force the `scikit-learn` version by overloading test requirements
sed -i "s/scikit-learn.*/scikit-learn==${scikit_learn}.*/g" skore/test-requirements.in

# Create the requirements file tree
mkdir -p $(dirname "${filepath}")

# Create the requirements file
python -m pip install --upgrade pip pip-tools --quiet
python -m piptools compile \
--quiet \
--no-strip-extras \
--no-header \
--extra=test \
--output-file="${filepath}" \
"${@:2}" \
skore/pyproject.toml
done
)
1 change: 1 addition & 0 deletions skore/ci/requirements/python-3.10/.python-version
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
3.10
Loading