Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move assert statements out of run_smoke_test and into the actual test (for graceful shutdown in case of failure) #318

Merged
merged 31 commits into from
Jan 24, 2025

Conversation

nerdai
Copy link
Collaborator

@nerdai nerdai commented Jan 23, 2025

PR Type

Fix

Short Description

THIS PR BROUGHT THE SMOKE 💨 💨 💨
image

This PR addresses an issue with our smoke tests where if one fails (the usual culprit being test_client_level_dp_breast_cancer) then it torpedos all of the remaining tests that follow it. Moreover, the error logs of the job looked super scary!

To address this issue, this PR:

  1. Ensures that each test explicitly fails if an Exception is encountered during the test.
  2. Aims to ensure that any lingering cancelled tasks in the event of unexpected test failure (due to Timeout or another reason) are cleared. The reason why we saw the domino effect is because if these cancelled tasks are not cleared, then they will be the next time an event loop starts running. In our case the next test would run the event loop but the hangover cancelled tasks would take precedence and then "stop" the loop and thus cause the next test to crash.

In addition to the above:

  • We also move the asserts used in run_smoke_test to the actual testing module test_smoke_test.py and "within" the test themselves directly. This is mostly for hygiene to follow typical pytest conventions which makes it easier to see why a test might fail and explicitly make it fail if an exception is raised.
  • Also the progress of these smoke tests can now actually be viewed in the logs of job as its running.
    image

NOTE:
If test_client_level_dp_breast_cancer fails, it will still fail the overall smoke test job which mean's we'll need to run all of the tests even passing ones again. In another PR we could mark this test with something like "flaky" and then create a separate job for flaky tests and so re-running these would be less time consuming.

Tests Added

N/A

@nerdai nerdai changed the title [WIP] Make every smoke tests have explicitly assert (to raise AssertionError) [WIP] Make every smoke tests explicitly assert fail in case of failure (i.e. to raise AssertionError) Jan 23, 2025
@nerdai nerdai changed the title [WIP] Make every smoke tests explicitly assert fail in case of failure (i.e. to raise AssertionError) [WIP] Move assert statements out of run_smoke_test and in the actual test (for graceful shutdown in case of failure) Jan 23, 2025
@nerdai nerdai changed the title [WIP] Move assert statements out of run_smoke_test and in the actual test (for graceful shutdown in case of failure) [WIP] Move assert statements out of run_smoke_test and into the actual test (for graceful shutdown in case of failure) Jan 23, 2025
@nerdai nerdai changed the title [WIP] Move assert statements out of run_smoke_test and into the actual test (for graceful shutdown in case of failure) Move assert statements out of run_smoke_test and into the actual test (for graceful shutdown in case of failure) Jan 23, 2025
@@ -50,7 +59,7 @@ async def run_smoke_test(
client_metrics: dict[str, Any] | None = None,
# assertion params
tolerance: float = DEFAULT_TOLERANCE,
) -> None:
) -> tuple[list[str], list[str]]:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This now returns server_errors and client_errors to the caller.

@@ -201,16 +215,16 @@ async def run_smoke_test(
break

return_code = server_process.returncode
assert return_code is None or (return_code is not None and return_code == 0), (
Copy link
Collaborator Author

@nerdai nerdai Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to remove these assert statements in way that has me not doing any dangerous Boolean algebra, I employ the following pattern. We can change this if we want...

# old
assert <cond>, <fail_msg>

# new
if not <cond>:
    raise SmokeTestError(fail_msg)

@nerdai nerdai requested a review from emersodb January 23, 2025 20:10
@nerdai nerdai force-pushed the nerdai/smoke-tests-explicitly-assert branch from 933c7e7 to 157e18b Compare January 23, 2025 20:16
)
task = asyncio.create_task(coro)
await task
except SmokeTestTimeoutError as e:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This culprit test will only attempt to retry if SmokeTestTimeoutError is raised. Otherwise the fail is due to another, potentially real error and we fail the test in this case.

f"Full client output:\n{full_client_output}\n"
f"[ASSERT ERROR] 'Client Evaluation Local Model Metrics' message not found for client {i}."
)
raise SmokeTestAssertError(msg)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tbh, I am not sure if I like the name SmokeTestAssertError it makes me feel like its asserting on a good output against an expected value.

I think these are probably better as SmokeTestExecutionError? I was just following our naming with [ASSERT ERROR] but I think this convention is confusing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with either. I'll leave it to you to decide 😂

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm lazy now. I'm going to leave it as is lol.

while True:
# giving a smaller timeout here just in case it hangs for a long time waiting for a single log line
Copy link
Collaborator Author

@nerdai nerdai Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cleaned this up, but note that we weren't actually giving this "inner task" a smaller timeout.

Instead I outsource this logic to a contained method get_output_from_stdout() which reads the stream until completion. This whole process of reading from the stream is what I assign a timeout for. (No more need for manual computation elapsed_time = datetime.datetime.now() - start_time if timeout was reached)

Copy link
Collaborator

@emersodb emersodb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some really minor comments. Otherwise, this looks awesome.

tests/smoke_tests/run_smoke_test.py Show resolved Hide resolved
.github/workflows/smoke_tests.yaml Outdated Show resolved Hide resolved
f"Full client output:\n{full_client_output}\n"
f"[ASSERT ERROR] 'Client Evaluation Local Model Metrics' message not found for client {i}."
)
raise SmokeTestAssertError(msg)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with either. I'll leave it to you to decide 😂

tests/smoke_tests/test_smoke_tests.py Outdated Show resolved Hide resolved
tests/smoke_tests/test_smoke_tests.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@emersodb emersodb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to go!

@nerdai
Copy link
Collaborator Author

nerdai commented Jan 24, 2025

collecting ... collected 350 items / 326 deselected / 24 selected
tests/smoke_tests/test_smoke_tests.py::test_basic_server_client_cifar PASSED [  4%]
tests/smoke_tests/test_smoke_tests.py::test_nnunet_config_2d PASSED      [  8%]
tests/smoke_tests/test_smoke_tests.py::test_nnunet_config_3d PASSED      [ 12%]
tests/smoke_tests/test_smoke_tests.py::test_scaffold PASSED              [ 16%]
tests/smoke_tests/test_smoke_tests.py::test_apfl PASSED                  [ 20%]
tests/smoke_tests/test_smoke_tests.py::test_feddg_ga PASSED              [ [25](https://github.com/VectorInstitute/FL4Health/actions/runs/12958022255/job/36147520295#step:9:26)%]
tests/smoke_tests/test_smoke_tests.py::test_basic PASSED                 [ 29%]
tests/smoke_tests/test_smoke_tests.py::test_client_level_dp_cifar PASSED [ 33%]
tests/smoke_tests/test_smoke_tests.py::test_client_level_dp_breast_cancer FAILED [ 37%]
tests/smoke_tests/test_smoke_tests.py::test_instance_level_dp_cifar PASSED [ 41%]
tests/smoke_tests/test_smoke_tests.py::test_dp_scaffold PASSED           [ 45%]
tests/smoke_tests/test_smoke_tests.py::test_fedbn PASSED                 [ 50%]
tests/smoke_tests/test_smoke_tests.py::test_fed_eval PASSED              [ 54%]
tests/smoke_tests/test_smoke_tests.py::test_fedper_mnist PASSED          [ 58%]
tests/smoke_tests/test_smoke_tests.py::test_fedper_cifar PASSED          [ 62%]
tests/smoke_tests/test_smoke_tests.py::test_ditto_mnist PASSED           [ 66%]
tests/smoke_tests/test_smoke_tests.py::test_mr_mtl_mnist PASSED          [ 70%]
tests/smoke_tests/test_smoke_tests.py::test_fenda PASSED                 [ 75%]
tests/smoke_tests/test_smoke_tests.py::test_fenda_ditto PASSED           [ 79%]
tests/smoke_tests/test_smoke_tests.py::test_perfcl PASSED                [ 83%]
tests/smoke_tests/test_smoke_tests.py::test_fl_plus_local PASSED         [ 87%]
tests/smoke_tests/test_smoke_tests.py::test_moon PASSED                  [ 91%]
tests/smoke_tests/test_smoke_tests.py::test_ensemble PASSED              [ 95%]
tests/smoke_tests/test_smoke_tests.py::test_flash PASSED                 [100%]
=================================== FAILURES ===================================
______________________ test_client_level_dp_breast_cancer ______________________

Smoke tests are not being torpedoed anymore with graceful shutdown and cleaning up tasks. In another PR we may want to address this flaky test or at least give it a different mark/ci-job so that we don't have to re-run all other smoke tests. Retry logic doesn't seem to be working for some reason -- could look into that after...

@nerdai nerdai merged commit 8dcf29b into main Jan 24, 2025
6 checks passed
@nerdai nerdai deleted the nerdai/smoke-tests-explicitly-assert branch January 24, 2025 22:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants