Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rebalance native CI jobs so that Misc4 is not twice the duration of other jobs #45750

Merged
merged 1 commit into from
Jan 23, 2025

Conversation

holly-cummins
Copy link
Contributor

@holly-cummins holly-cummins commented Jan 21, 2025

This PR aims to help these problems:

  • The Misc4 job takes much longer than the other native jobs, meaning feedback on that job is slower
  • The slow feedback matters because the Gradle project is in the job, and it seems to be one of the more likely projects to fail
  • The Misc grouping name doesn’t give much insight into what’s in the bucket
  • The Native Devtools job is so short it's a bit of a waste of a machine

To address these, I moved the gradle tests to the Devtools job, since it’s one of the fastest. Then I noticed that almost everything that was left was observability-related, so I renamed the job and distributed the non-observability projects through the other jobs. I also moved maven in with gradle, since they seem like a logical grouping.

I think there are a few risks with this kind of change

  • Syntax errors causing build failures, because the PR will not exercise the new yaml. To mitigate, I tested on my fork: https://github.com/holly-cummins/quarkus/actions/runs/12884686079
  • Jobs getting accidentally removed. I think we can only mitigate this by careful inspection (and maybe by keeping rebalances small and incremental so they’re easier to inspect).
  • Name changes causing confusion for people looking at jobs, or for develocity analysis. I think this is not too big a problem.
  • Making the balance worse. To check this, I made a change in core, which I’m hoping would clear the caches, and ran my test build.

@brunobat could you validate the contents of the observability bucket look like a complete-ish consistent set?

Here’s the timings I observed on my test runs. The codebases are a bit different between the two, but it shows the ratios.

Old

Misc 1: 31m
Misc 2: 22m
Misc 3: 30m
Misc 4: 54m (often it’s over an hour for me)
DevTools: 4m (!)

New

Misc 1: 26m
Misc 2: 29m
Misc 3: 28m
Observability: 35m
Build tools and DevTools: 29m

@quarkus-bot quarkus-bot bot added the area/infra-automation anything related to CI, bots, etc. that are used to automated our infrastructure label Jan 21, 2025
Copy link
Contributor

@brunobat brunobat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @holly-cummins!

@geoand
Copy link
Contributor

geoand commented Jan 21, 2025

@brunobat you might want to rebase #43831 once this is in

This comment has been minimized.

@holly-cummins
Copy link
Contributor Author

io.quarkus.observability.test.LgtmResourcesTest.testTracing is failing in both native and JVM jobs, so I'm pretty confident it's not related to these changes.

@geoand
Copy link
Contributor

geoand commented Jan 21, 2025

Yeah, that's been pretty flaky

@holly-cummins
Copy link
Contributor Author

Looks like it started failing on January 17, in fact.
image

@brunobat
Copy link
Contributor

Will take a look at that test

@holly-cummins
Copy link
Contributor Author

Hi @gsmet, are you able to look at this?

@gsmet
Copy link
Member

gsmet commented Jan 23, 2025

@holly-cummins it's on my list for today. Wanted to get @brunobat's PR first as one of them was conflicting and this one is the easy one.

@gsmet
Copy link
Member

gsmet commented Jan 23, 2025

(I'm currently having a look at the LGTM timeouts)

@holly-cummins
Copy link
Contributor Author

I'll do the rebase against #43831.

Copy link
Member

@gsmet gsmet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes a lot of sense, thanks!

@gsmet gsmet merged commit d860909 into quarkusio:main Jan 23, 2025
46 of 50 checks passed
@quarkus-bot quarkus-bot bot added this to the 3.19 - main milestone Jan 23, 2025
@gsmet
Copy link
Member

gsmet commented Jan 23, 2025

I merged as the native tests passed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/infra-automation anything related to CI, bots, etc. that are used to automated our infrastructure triage/flaky-test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants