Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed job, incorrectly updated JIRA tickets with passed test results. #159

Open
vi-patel opened this issue Feb 19, 2024 · 2 comments
Open

Comments

@vi-patel
Copy link

Failed job, incorrectly updated JIRA tickets with passed test results.

The following is a test that fails on pod creation, and doesn't run the set of tests:
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-knative-serverless-operator-main-ocp4.15-lp-interop-operator-e2e-interop-aws-ocp415/1759589706720350208/artifacts/operator-e2e-interop-aws-ocp415/firewatch-report-issues/build-log.txt

Prow correctly marks the job as failed. However, firewatch incorrectly reports this job as a success (https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-knative-serverless-operator-main-ocp4.15-lp-interop-operator-e2e-interop-aws-ocp415/1759589706720350208/artifacts/operator-e2e-interop-aws-ocp415/firewatch-report-issues/build-log.txt) updating Jira tickets with passing job notifications and job labels to other linked Jira tickets.

@calebevans
Copy link
Collaborator

After some investigation, I have found the issue... It seems like the pod that failed operator-e2e didn't come up, but the finished.json file was not updated with the failure until after the firewatch execution occurred. This seems to be the way OpenShift CI or Prow operates. Unfortunately, in its current state I'm not sure this sort of error can be caught by firewatch, and it is hard to test this as it is an edge case. It seems the order of operation here follows something like this:

  1. Container fails to come up
  2. Prow writes the finished.json file as a success in the operator-e2e step
  3. Prow starts the "post" steps which contains firewatch-report-issues (ref that executes firewatch)
  4. Prow updates the finished.json file to reflect a failure after firewatch has already run

firewatch executed and finished at 16:56:32
image

operator-e2e files (finished.json) updated at 17:22:23
image

With the files updated, I can run firewatch and create the correct bug in stage: https://issues.stage.redhat.com/browse/LPTOCPCI-1145

My current thoughts on this are - I don't think we can resolve this without running firewatch as a service outside of OpenShift CI. Would appreciate some ideas to resolve this behavior.

@vi-patel
Copy link
Author

vi-patel commented Mar 1, 2024

Bug filed against DPTP, adding for reference: https://issues.redhat.com/browse/DPTP-3902

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants