Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert ci-shark-ai.yml to pkgci_shark_ai.yml. #625

Draft
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

ScottTodd
Copy link
Member

@ScottTodd ScottTodd commented Nov 27, 2024

This builds on #589 to make progress on #584.

On the surface, this adds complexity right now. The benefits are:

  • Integration test workflows use dev packages, without needing to build them from source or use editable installs
  • Users and developers can access the same dev packages to reproduce CI results
  • We only need to build shortfin from source once for multiple jobs (potentially saving 2-3 minutes)
  • Only one runner needs the build requirements (potentially including clang, ninja, CMake, Rust, etc.), other runners only need Python

This also switches to using Python venvs, tightening configuration control in workflows and ensuring that only the expected packages are installed and used for tests, without leaking installed packages across workflow runs. That does increase the workflow time though.


Metrics using pip

Logs before: https://github.com/nod-ai/shark-ai/actions/runs/12058325322

  • Total: 7m6s
  • 3m5s on Install pip deps (no venv, reusing already installed packages and getting the wrong IREE versions)
  • 3m14s on Run LLM Integration Tests

Logs after (cold cache): https://github.com/nod-ai/shark-ai/actions/runs/12059301876/attempts/1?pr=625

  • Total: 9m57s
  • 2m42s on Build shortfin (cold cache)
  • 2m37s on Setup venv (from scratch, no deps reused)
  • 16s on Install nightly IREE packages
  • 3m25s on Run LLM Integration Tests

Logs after (warm cache): https://github.com/nod-ai/shark-ai/actions/runs/12059301876/attempts/2?pr=625

  • Total: 8m20s
  • 1m24s on Build shortfin (warm cache, 46% cache hit rate)
  • 2m23s on Setup venv (from scratch, no deps reused)
  • 17s on Install nightly IREE packages
  • 3m23s on Run LLM Integration Tests

Metrics using uv

uv (https://docs.astral.sh/uv/) can be used as an alternative to pip

Logs (cold cache): https://github.com/nod-ai/shark-ai/actions/runs/12147344797?pr=625

  • Total: 6m31s
  • 2m48s on Build shortfin (cold cache)
  • 1m58s on Setup venv (from scratch, no deps reused)
  • 24s on Install nightly IREE packages
  • 24s on Run LLM Integration Tests (failed)

Logs (warm cache): https://github.com/nod-ai/shark-ai/actions/runs/12147452119?pr=625

  • Total: 4m30s
  • 1m30s on Build shortfin (cold cache)
  • 1m3s on Setup venv (from scratch, no deps reused)
  • 10s on Install nightly IREE packages
  • 23s on Run LLM Integration Tests (failed)

renxida added a commit that referenced this pull request Jan 10, 2025
… build packages once (#780)

This builds on #625, #589 to make progress on issue #584.

This adds a pkgci.yml to run multiple package-based CI tasks after
building package using Scott's changes in #667. This gives us the
following benefits:

* Integration test workflows are faster because they now use dev
packages, without needing to build them from source or use editable
installs. Also, if more integration tests are added, they can reuse the
built packages.
* Users and developers can access the same dev packages to reproduce CI
results
* Only one runner needs the build requirements (potentially including
clang, ninja, CMake, Rust, etc.), other runners only need Python.

This also switches to using uv to create venvs, which is faster.

This PR brings shortfin CPU LLM CI time to roughly half an hour on the
mi250 runner to a few seconds of package build (fast due to caching) and
around 5 minutes of testing.

---------

Co-authored-by: Scott Todd <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant