Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix incomplete config on re-used LGTM containers #45689

Merged

Conversation

holly-cummins
Copy link
Contributor

I have created a problem where containers in the LGTM integration tests are not cleaned up properly. Obviously, I need to fix this, but it exposed another problem.

When the LGTM dev service is started fresh, the config is something like this:

Dev Service Lgtm started, config: {grafana.endpoint=http://localhost:35865, quarkus.otel.exporter.otlp.endpoint=http://localhost:37417, otel-collector.url=localhost:37417, quarkus.micrometer.export.otlp.url=http://localhost:37417/v1/metrics, quarkus.otel.exporter.otlp.protocol=http/protobuf}

When it’s re-used, it’s

Dev Service Lgtm re-used, config: {grafana.endpoint=http://localhost:35725, otel-collector.url=localhost:41111}

Three pieces of config are missing. This causes the test to fail trying to connect to the default, non-mapped, OTLP endpoint and metrics endpoint. On investigation, there’s a //FIXME at the problem line of code. So I fixed it. :)

I didn’t achieve perfect consolidation of the two methods, but it’s more consolidated (and the config is complete in the re-use case). The consolidated code is also longer than the non-consolidated code, but we won't mention that.

Please do check it carefully, because it’s not an area I know well and it would be easy for me to swap constants or muddle theintended switch logic. I left the (now unused) method on LGTMContainer to get the OTLP port, since it looked potentially useful, but we could consider deleting it.

This comment has been minimized.

@holly-cummins
Copy link
Contributor Author

https://github.com/quarkusio/quarkus/actions/runs/12836055062/job/35797224159?pr=45689 was clean and ran the LGTM tests. It's just the build report that failed with Error: [ERROR] Could not resolve dependencies: The following artifacts could not be resolved: io.quarkus.bot:action-build-reporter:jar:999-SNAPSHOT (absent): Could not find artifact io.quarkus.bot:action-build-reporter:jar:999-SNAPSHOT in quarkus-github-action (https://maven.pkg.github.com/quarkusio/action-build-reporter/), and I think it must be unrelated (I've seen similar failures on other builds).

@holly-cummins holly-cummins requested a review from alesj January 17, 2025 20:31
@alesj
Copy link
Contributor

alesj commented Jan 17, 2025

https://github.com/quarkusio/quarkus/actions/runs/12836055062/job/35797224159?pr=45689 was clean and ran the LGTM tests. It's just the build report that failed with Error: [ERROR] Could not resolve dependencies: The following artifacts could not be resolved: io.quarkus.bot:action-build-reporter:jar:999-SNAPSHOT (absent): Could not find artifact io.quarkus.bot:action-build-reporter:jar:999-SNAPSHOT in quarkus-github-action (https://maven.pkg.github.com/quarkusio/action-build-reporter/), and I think it must be unrelated (I've seen similar failures on other builds).

Yeah, this is imho not related to your changes.
Or I would be very surprised if it would be.

@alesj
Copy link
Contributor

alesj commented Jan 17, 2025

Ah, dunno why I thought Dev Resources won't survive the re-start.
But I guess they do, since DevResources.resources() keeps a static ref to them.
And that's why I complicated things a bit too much -- since you can only guess as much from ContainerLocator, based on existing ports, etc

But since they "survive", I guess your code makes sense.

I'm now wondering how / why just newly added LgtmReloadTest works w/o problems, even before your fix.
Any idea? :-)

@alesj
Copy link
Contributor

alesj commented Jan 17, 2025

I'm now wondering how / why just newly added LgtmReloadTest works w/o problems, even before your fix. Any idea? :-)

Since it should be missing quarkus.otel.exporter.otlp.endpoint property - to know where to send the traces.
Or are you saying this why -- I have created a problem where containers in the LGTM integration tests are not cleaned up properly ?

@alesj
Copy link
Contributor

alesj commented Jan 18, 2025

Ah ... Equal config, re-using existing Lgtm container ...
Move on, nothing to see here. :-) ... wrt to me asking why the LgtmReloadTest works ..

OK, we need to change the test to also modify the props just so that the config is not the same, somehow ...
Otherwise we would see your problem before.

@alesj
Copy link
Contributor

alesj commented Jan 18, 2025

OK, added this

Only then did I realised that of course I was mixing what your PR does, doh!
re-load != re-use

But, it did help add this new re-load part of the test.
And to realise that the Dev Resources do "remember" previous config, etc.
Which I guess can help with setting up the other props -- as per this PR (finally getting it :-))

@holly-cummins
Copy link
Contributor Author

Ah ... Equal config, re-using existing Lgtm container ... Move on, nothing to see here. :-) ... wrt to me asking why the LgtmReloadTest works ..

OK, we need to change the test to also modify the props just so that the config is not the same, somehow ... Otherwise we would see your problem before.

Yes, it's a good point; I was kind of surprised that it took me totally breaking container shutdown to expose the problem, since it felt like a path customer applications might have gone down. And so you're right there should be tests to cover that case.

@holly-cummins holly-cummins force-pushed the better-support-lgtm-container-reuse branch from ebc4d57 to 8450e15 Compare January 20, 2025 11:14
Copy link

quarkus-bot bot commented Jan 20, 2025

Status for workflow Quarkus CI

This is the status report for running Quarkus CI on commit 8450e15.

Failing Jobs

Status Name Step Failures Logs Raw logs Build scan
Native Tests - Misc4 Build Failures Logs Raw logs 🔍

Full information is available in the Build summary check run.
You can consult the Develocity build scans.

Failures

⚙️ Native Tests - Misc4 #

- Failing: integration-tests/observability-lgtm 

📦 integration-tests/observability-lgtm

io.quarkus.observability.test.LgtmResourcesIT.testTracing - History - More details - Source on GitHub

java.lang.RuntimeException: 
java.lang.RuntimeException: io.quarkus.builder.BuildException: Build failure: Build failed due to errors
	[error]: Build step io.quarkus.observability.deployment.ObservabilityDevServiceProcessor#startContainers threw an exception: java.lang.RuntimeException: org.testcontainers.containers.ContainerLaunchException: Container startup failed for image docker.io/grafana/otel-lgtm:0.8.2
	at io.quarkus.observability.deployment.ObservabilityDevServiceProcessor.lambda$startContainers$3(ObservabilityDevServiceProcessor.java:170)
	at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625)
	at java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:762)
	at io.quarkus.observability.deployment.ObservabilityDevServiceProcessor.startContainers(ObservabilityDevServiceProcessor.java:113)
	at java.base/java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:732)

Copy link
Contributor

@brunobat brunobat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alesj will be doing changes to the ExtensionsCatalog that touch this same code.
Do you want to merge this 1st or have @holly-cummins reimplementing this after you commit?

@alesj
Copy link
Contributor

alesj commented Jan 20, 2025 via email

@brunobat
Copy link
Contributor

brunobat commented Jan 21, 2025

@holly-cummins, we still get the container timeout error from time to time:
https://github.com/quarkusio/quarkus/actions/runs/12866804676/job/35870717832?pr=45689#step:16:628
This time happened only on the misc4 tests...
Not sure if the timeout can be extended...

@holly-cummins
Copy link
Contributor Author

@holly-cummins, we still get the container timeout error from time to time: https://github.com/quarkusio/quarkus/actions/runs/12866804676/job/35870717832?pr=45689#step:16:628 This time happened only on the misc4 tests... Not sure if the timeout can be extended...

Do you think that's related to these changes, @brunobat? I guess it's hard to see how it's not, since I changed that code, but it's also hard to see how adding the extra config properties would cause that.

@brunobat
Copy link
Contributor

I agree, @holly-cummins and I've seen this on other PRs.
Let's move on and see what happens.

@brunobat brunobat merged commit 174290c into quarkusio:main Jan 21, 2025
20 of 21 checks passed
@quarkus-bot quarkus-bot bot added this to the 3.19 - main milestone Jan 21, 2025
@holly-cummins
Copy link
Contributor Author

I agree, @holly-cummins and I've seen this on other PRs. Let's move on and see what happens.

I think, based on casual inspection, it's the same failure as this one, which started on January 17?

image

@alesj
Copy link
Contributor

alesj commented Jan 21, 2025

As I wrote somewhere already -- we were definitely missing Testcontainers' LGTM wait strategy.
This was recently added, and I think it should help ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants