Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: properly handle skip on assessments #473

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

phisco
Copy link
Contributor

@phisco phisco commented Jan 3, 2025

What type of PR is this?

/kind bug

What this PR does / why we need it:

t.Skip with --fail-fast was resulting in the parent test being failed, this patch fixes that, however I'm not entirely sure about the sematic of --fail-fast, so I'd appreciate if someone could clarify that.

For sure, independently of --fail-fast a call to t.Skip* in an Assessment should only skip the current Assessment. But in case of calls to t.Fail or t.FailNow, what should be the behaviour in the two cases?

Which issue(s) this PR fixes:

Fixes #472

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Fix bug failing tests being skipped when `--fail-fast` is set.

Additional documentation e.g., Usage docs, etc.:


Signed-off-by: Philippe Scorsolini <[email protected]>
@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Jan 3, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: phisco
Once this PR has been reviewed and has the lgtm label, please assign cpanato for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link
Contributor

Hi @phisco. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 3, 2025
Copy link
Contributor

@Fricounet Fricounet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching and fixing the bug.

however I'm not entirely sure about the sematic of --fail-fast, so I'd appreciate if someone could clarify that.

I don't have the full context but from my understanding, --fail-fast is an attempt to adapt the go test -failfast flag to the e2e-framework, ie have a way to stop assessments and features executions early.

For sure, independently of --fail-fast a call to t.Skip* in an Assessment should only skip the current Assessment. But in case of calls to t.Fail or t.FailNow, what should be the behaviour in the two cases?

In my understanding of the framework, Assessments can be kinda linked together as part of the same 'Test' while features are some kind of distinct 'Test'.
Based on the semantics of Fail (marks the function as having failed but continues execution.) and FailNow (marks the function as having failed and stops its execution by calling runtime.Goexit. Execution will continue at the next test or benchmark.), my interpretation is that:

  • a call to Fail should not attempt to skip any remaining assessment or feature
  • a call to FailNow should skip all the remaining assessment but not skip the next features

I guess the behavior for FailNow could be argued for or against skipping the next features too. But this is more a question of whether features should be considered independent or not, which, as far as I know is not really clarified anywhere.

// invoked to make sure we leave the traces of the failed test behind to enable better debugging for the
// test developers
if e.cfg.FailFast() && failed {
newT.FailNow()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will the failure be propagated properly to the parent test without this?

// The test passed, nothing to do
return
}
if !finished && e.cfg.FailFast() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if !finished && e.cfg.FailFast() {
if !finished || e.cfg.FailFast() {

shouldn't this be an or instead? Based on the comment below

Comment on lines +523 to +524
// The test didn't finish, this means t.FailNow was called
// or we are in fail fast mode, we should skip the remaining assessments
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// The test didn't finish, this means t.FailNow was called
// or we are in fail fast mode, we should skip the remaining assessments
// The test failed and either:
// didn't finish, this means t.FailNow was called
// OR we are in fail fast mode.
// We should skip the remaining assessments

maybe a bit more explicit?

// test developers
if e.cfg.FailFast() && failed {
newT.FailNow()
if failFast {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we should also skip teardowns when t.FailNow() is called. In particular, I'm not sure it is very logical to skip teardowns for t.FailNow() but not for t.Fail().

In which case, scoping to failed tests with FailFast configured would be better I think. What do you think?

@phisco
Copy link
Contributor Author

phisco commented Jan 8, 2025

  • a call to Fail should not attempt to skip any remaining assessment or feature
  • a call to FailNow should skip all the remaining assessment but not skip the next features

Makes sense, @Fricounet, but what about --fail-fast? Should we skip the remaining assessments independently of which one was called, Fail or FailNow when --fail-fast is set? I'd say so from:

There are times in your test infra that you want the rest of your feature assessments to fail in
case if one of the current executing assessments fail.
This can aid in getting a better turn around time especially in cases where your assessments are
inter-related and a failed assessment in step 1 can mean that running the rest of the assessment
is guaranteed to fail. This can be done with the help of a `--fail-fast` flag provided at the
framework level.
This works similar to how the `-failfast` mode of the `go test` works but provides the same
feature at the context of the `e2e-framework`.

@Fricounet
Copy link
Contributor

Makes sense, @Fricounet, but what about --fail-fast? Should we skip the remaining assessments independently of which one was called, Fail or FailNow when --fail-fast is set? I'd say so from:

I'd say so too @phisco. Though, I don't agree with everything written in the README. In particular:

This can aid in getting a better turn around time especially in cases where your assessments are
inter-related and a failed assessment in step 1 can mean that running the rest of the assessment
is guaranteed to fail. This can be done with the help of a --fail-fast flag provided at the
framework level.

To me, this is the job of t.FailNow()

Basically:

  • as the test writer, I know when it makes sense for my test to keep going (use t.Fail()) or stop early because there is no point going further (use t.FailNow())
  • as the test executor, I know if I want my whole test suite to exit early at the first sign of failure (use --fail-fast). Hence --fail-fast should skip the remaining assessments regardless of Fail/FailNow

Hope that makes sense 😄

@vladimirvivien
Copy link
Contributor

@phisco @Fricounet Thanks for this issue.
The intent of --fail-fast is properly characterized here.
It was a way to abruptly stop all assessments running.
This was an early feature of the framework and if you guys
think it could work better, let's keep this discussion going.

cc @harshanarayana

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Skipping when fail-fast is set results in failure
4 participants