Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch presubmit CI workflows to use pinned IREE versions. #774

Merged
merged 6 commits into from
Jan 7, 2025

Conversation

ScottTodd
Copy link
Member

Progress on #760.

The idea here is that we will test with only pinned versions in all workflows that run on pull_request and push triggers, then we will create pull requests (ideally via automation like dependabot) that attempt to update the pinned versions. This will give us confidence that test regressions are only due to the code changes in the pull request and not due to a dependency changing. Workflows will also be more reproducible as the versions they fetch will come from source code and not an external, time-dependent source.

@ScottTodd ScottTodd marked this pull request as ready for review January 7, 2025 18:34

# TODO(#760): include iree-turbine in this requirements file too?
# iree-turbine==3.1.0rc20241205
iree-turbine==3.1.0rc20241205
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All Llama sharktank/shortfin pre-submits pass on latest IREE release, so shouldn't be an issue updating to latest release. (Not sure about sdxl)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They pass on the latest IREE release today. They might not in the future.

We'll catch issues as part of version pin update PRs, like #773.

Copy link
Collaborator

@archana-ramalingam archana-ramalingam Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know we have a PR to bump to latest version.
shark-ai CI is passing in main but failing here. @stbaione @renxida Did we see any regression on this IREE version? Not an issue, if we are bumping the version soon.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear with this PR: this isn't a temporary stabilizing measure, this is rolling out a methodical way of managing versions. At the moment we only have manually created PRs to update the versions. I'm looking into tools like dependabot to automate that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Via, conversation in previous PR, this version should be fine!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe not actually... shark-ai is currently passing in main, but seems to be failing at the compilation step in this PR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, good catch. I can update the pin in this PR too then 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, looks good now

@ScottTodd
Copy link
Member Author

Oh this is weird... shortfin tests are hanging here now too, just like on #773, despite keeping the runtime version pin fixed. Sample logs: https://github.com/nod-ai/shark-ai/actions/runs/12658739223/job/35276342245?pr=774

@stbaione
Copy link
Contributor

stbaione commented Jan 7, 2025

Oh this is weird... shortfin tests are hanging here now too, just like on #773, despite keeping the runtime version pin fixed. Sample logs: https://github.com/nod-ai/shark-ai/actions/runs/12658739223/job/35276342245?pr=774

It looks like ci-libshortfin.yml uses requirements-iree-pinned.txt, which was updated to the same version as #773 (20250107) in this PR.

Functionally, maybe the lower-level error isn't properly handled, causing the main thread to hang?

@ScottTodd
Copy link
Member Author

Oh this is weird... shortfin tests are hanging here now too, just like on #773, despite keeping the runtime version pin fixed. Sample logs: https://github.com/nod-ai/shark-ai/actions/runs/12658739223/job/35276342245?pr=774

It looks like ci-libshortfin.yml uses requirements-iree-pinned.txt, which was updated to the same version as #773 (20250107) in this PR.

Functionally, maybe the lower-level error isn't properly handled, causing the main thread to hang?

Found a few issues and sent fixes:

Comment on lines +6 to +8
iree-base-compiler==3.1.0rc20250107
iree-base-runtime==3.1.0rc20250107
iree-turbine==3.1.0rc20250107
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the pin to our release candidate (last night's IREE build). I would have made that change incrementally, but there were compilation errors at https://github.com/nod-ai/shark-ai/actions/runs/12656984889/job/35270801217. I can't tell from the logs what the issue was though.

Copy link
Contributor

@stbaione stbaione left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, once shark-ai passes

@ScottTodd ScottTodd merged commit 2c699fd into nod-ai:main Jan 7, 2025
22 of 24 checks passed
@ScottTodd ScottTodd deleted the ci-pinned branch January 7, 2025 23:30
monorimet pushed a commit that referenced this pull request Jan 8, 2025
Progress on #760.

The idea here is that we will test with only pinned versions in all
workflows that run on `pull_request` and `push` triggers, then we will
create pull requests (ideally via automation like dependabot) that
attempt to update the pinned versions. This will give us confidence that
test regressions are _only_ due to the code changes in the pull request
and not due to a dependency changing. Workflows will also be more
reproducible as the versions they fetch will come from source code and
not an external, time-dependent source.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants