Move assert statements out of `run_smoke_test` and into the actual test (for graceful shutdown in case of failure) #318

nerdai · 2025-01-23T02:18:17Z

PR Type

Fix

Short Description

THIS PR BROUGHT THE SMOKE 💨 💨 💨

This PR addresses an issue with our smoke tests where if one fails (the usual culprit being test_client_level_dp_breast_cancer) then it torpedos all of the remaining tests that follow it. Moreover, the error logs of the job looked super scary!

To address this issue, this PR:

Ensures that each test explicitly fails if an Exception is encountered during the test.
Aims to ensure that any lingering cancelled tasks in the event of unexpected test failure (due to Timeout or another reason) are cleared. The reason why we saw the domino effect is because if these cancelled tasks are not cleared, then they will be the next time an event loop starts running. In our case the next test would run the event loop but the hangover cancelled tasks would take precedence and then "stop" the loop and thus cause the next test to crash.

In addition to the above:

We also move the asserts used in run_smoke_test to the actual testing module test_smoke_test.py and "within" the test themselves directly. This is mostly for hygiene to follow typical pytest conventions which makes it easier to see why a test might fail and explicitly make it fail if an exception is raised.
Also the progress of these smoke tests can now actually be viewed in the logs of job as its running.

NOTE:
If test_client_level_dp_breast_cancer fails, it will still fail the overall smoke test job which mean's we'll need to run all of the tests even passing ones again. In another PR we could mark this test with something like "flaky" and then create a separate job for flaky tests and so re-running these would be less time consuming.

Tests Added

N/A

nerdai · 2025-01-23T20:07:39Z

tests/smoke_tests/run_smoke_test.py

@@ -50,7 +59,7 @@ async def run_smoke_test(
    client_metrics: dict[str, Any] | None = None,
    # assertion params
    tolerance: float = DEFAULT_TOLERANCE,
-) -> None:
+) -> tuple[list[str], list[str]]:


This now returns server_errors and client_errors to the caller.

nerdai · 2025-01-23T20:09:54Z

tests/smoke_tests/run_smoke_test.py

@@ -201,16 +215,16 @@ async def run_smoke_test(
            break

        return_code = server_process.returncode
-        assert return_code is None or (return_code is not None and return_code == 0), (


In order to remove these assert statements in way that has me not doing any dangerous Boolean algebra, I employ the following pattern. We can change this if we want...

# old assert <cond>, <fail_msg> # new if not <cond>: raise SmokeTestError(fail_msg)

nerdai · 2025-01-24T17:48:31Z

tests/smoke_tests/test_smoke_tests.py

+            )
+            task = asyncio.create_task(coro)
+            await task
+        except SmokeTestTimeoutError as e:


This culprit test will only attempt to retry if SmokeTestTimeoutError is raised. Otherwise the fail is due to another, potentially real error and we fail the test in this case.

nerdai · 2025-01-24T17:56:06Z

tests/smoke_tests/run_smoke_test.py

+                    f"Full client output:\n{full_client_output}\n"
+                    f"[ASSERT ERROR] 'Client Evaluation Local Model Metrics' message not found for client {i}."
+                )
+                raise SmokeTestAssertError(msg)


Tbh, I am not sure if I like the name SmokeTestAssertError it makes me feel like its asserting on a good output against an expected value.

I think these are probably better as SmokeTestExecutionError? I was just following our naming with [ASSERT ERROR] but I think this convention is confusing.

I'm good with either. I'll leave it to you to decide 😂

I'm lazy now. I'm going to leave it as is lol.

nerdai · 2025-01-24T18:03:44Z

tests/smoke_tests/run_smoke_test.py

        while True:
-            # giving a smaller timeout here just in case it hangs for a long time waiting for a single log line


I cleaned this up, but note that we weren't actually giving this "inner task" a smaller timeout.

Instead I outsource this logic to a contained method get_output_from_stdout() which reads the stream until completion. This whole process of reading from the stream is what I assign a timeout for. (No more need for manual computation elapsed_time = datetime.datetime.now() - start_time if timeout was reached)

emersodb

Some really minor comments. Otherwise, this looks awesome.

tests/smoke_tests/run_smoke_test.py

.github/workflows/smoke_tests.yaml

emersodb · 2025-01-24T17:58:24Z

tests/smoke_tests/run_smoke_test.py

+                    f"Full client output:\n{full_client_output}\n"
+                    f"[ASSERT ERROR] 'Client Evaluation Local Model Metrics' message not found for client {i}."
+                )
+                raise SmokeTestAssertError(msg)


I'm good with either. I'll leave it to you to decide 😂

tests/smoke_tests/test_smoke_tests.py

emersodb

Good to go!

nerdai · 2025-01-24T22:11:39Z

collecting ... collected 350 items / 326 deselected / 24 selected
tests/smoke_tests/test_smoke_tests.py::test_basic_server_client_cifar PASSED [  4%]
tests/smoke_tests/test_smoke_tests.py::test_nnunet_config_2d PASSED      [  8%]
tests/smoke_tests/test_smoke_tests.py::test_nnunet_config_3d PASSED      [ 12%]
tests/smoke_tests/test_smoke_tests.py::test_scaffold PASSED              [ 16%]
tests/smoke_tests/test_smoke_tests.py::test_apfl PASSED                  [ 20%]
tests/smoke_tests/test_smoke_tests.py::test_feddg_ga PASSED              [ [25](https://github.com/VectorInstitute/FL4Health/actions/runs/12958022255/job/36147520295#step:9:26)%]
tests/smoke_tests/test_smoke_tests.py::test_basic PASSED                 [ 29%]
tests/smoke_tests/test_smoke_tests.py::test_client_level_dp_cifar PASSED [ 33%]
tests/smoke_tests/test_smoke_tests.py::test_client_level_dp_breast_cancer FAILED [ 37%]
tests/smoke_tests/test_smoke_tests.py::test_instance_level_dp_cifar PASSED [ 41%]
tests/smoke_tests/test_smoke_tests.py::test_dp_scaffold PASSED           [ 45%]
tests/smoke_tests/test_smoke_tests.py::test_fedbn PASSED                 [ 50%]
tests/smoke_tests/test_smoke_tests.py::test_fed_eval PASSED              [ 54%]
tests/smoke_tests/test_smoke_tests.py::test_fedper_mnist PASSED          [ 58%]
tests/smoke_tests/test_smoke_tests.py::test_fedper_cifar PASSED          [ 62%]
tests/smoke_tests/test_smoke_tests.py::test_ditto_mnist PASSED           [ 66%]
tests/smoke_tests/test_smoke_tests.py::test_mr_mtl_mnist PASSED          [ 70%]
tests/smoke_tests/test_smoke_tests.py::test_fenda PASSED                 [ 75%]
tests/smoke_tests/test_smoke_tests.py::test_fenda_ditto PASSED           [ 79%]
tests/smoke_tests/test_smoke_tests.py::test_perfcl PASSED                [ 83%]
tests/smoke_tests/test_smoke_tests.py::test_fl_plus_local PASSED         [ 87%]
tests/smoke_tests/test_smoke_tests.py::test_moon PASSED                  [ 91%]
tests/smoke_tests/test_smoke_tests.py::test_ensemble PASSED              [ 95%]
tests/smoke_tests/test_smoke_tests.py::test_flash PASSED                 [100%]
=================================== FAILURES ===================================
______________________ test_client_level_dp_breast_cancer ______________________

Smoke tests are not being torpedoed anymore with graceful shutdown and cleaning up tasks. In another PR we may want to address this flaky test or at least give it a different mark/ci-job so that we don't have to re-run all other smoke tests. Retry logic doesn't seem to be working for some reason -- could look into that after...

nerdai changed the title ~~[WIP] Make every smoke tests have explicitly assert (to raise AssertionError)~~ [WIP] Make every smoke tests explicitly assert fail in case of failure (i.e. to raise AssertionError) Jan 23, 2025

nerdai changed the title ~~[WIP] Make every smoke tests explicitly assert fail in case of failure (i.e. to raise AssertionError)~~ [WIP] Move assert statements out of run_smoke_test and in the actual test (for graceful shutdown in case of failure) Jan 23, 2025

nerdai changed the title ~~[WIP] Move assert statements out of run_smoke_test and in the actual test (for graceful shutdown in case of failure)~~ [WIP] Move assert statements out of run_smoke_test and into the actual test (for graceful shutdown in case of failure) Jan 23, 2025

nerdai changed the title ~~[WIP] Move assert statements out of run_smoke_test and into the actual test (for graceful shutdown in case of failure)~~ Move assert statements out of run_smoke_test and into the actual test (for graceful shutdown in case of failure) Jan 23, 2025

nerdai commented Jan 23, 2025

View reviewed changes

nerdai requested a review from emersodb January 23, 2025 20:10

nerdai added 15 commits January 23, 2025 15:14

mv assertions outside of run_smoke_tests

2046b24

add workflow_dispatch trigger for testing

a65329c

get rid of ALL assert used in and raise Custom exception instead

8dfe3a7

add verbose flag

a2490f1

change scope to session

aead025

use module instead of session, should still work the same

e8e6d5e

subset of tests, to test cleanup

a2cdef1

cleanup func

4b8ebc9

smaller subset

2635e79

wip

f898bc7

test entire

6399c72

handle TimeoutError

6eee21b

add cancel cleanup

32ef5b1

unlock all

f166d74

rm workflow_dispatch trigger

157e18b

nerdai force-pushed the nerdai/smoke-tests-explicitly-assert branch from 933c7e7 to 157e18b Compare January 23, 2025 20:16

nerdai added 6 commits January 24, 2025 11:28

clean up getting output from stdout

c527c05

working

8962e33

add comment

94ab853

test on runner

69afc0b

add retry for flaky test

c095753

removed unnecessary event_loop from tests

810dd30

nerdai commented Jan 24, 2025

View reviewed changes

emersodb approved these changes Jan 24, 2025

View reviewed changes

nerdai added 5 commits January 24, 2025 13:20

cr

ab8d3a7

rm workflow_dispatch

f6d3cea

use function scope

01fdbd2

add workflow_dispatch to test function scope

643ed0d

rm workflow_dispatch

791ca2d

emersodb approved these changes Jan 24, 2025

View reviewed changes

nerdai added 5 commits January 24, 2025 16:25

revert back to module

9a5b0e1

add graceful shutdown of processes

6d2ea4a

comment

d61fc5c

increase attempts of flaky test

1e9b944

rm workflow_dispatch

5c5576c

nerdai merged commit 8dcf29b into main Jan 24, 2025
6 checks passed

nerdai deleted the nerdai/smoke-tests-explicitly-assert branch January 24, 2025 22:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move assert statements out of `run_smoke_test` and into the actual test (for graceful shutdown in case of failure) #318

Move assert statements out of `run_smoke_test` and into the actual test (for graceful shutdown in case of failure) #318

nerdai commented Jan 23, 2025 •

edited

Loading

nerdai Jan 23, 2025

nerdai Jan 23, 2025 •

edited

Loading

nerdai Jan 24, 2025

nerdai Jan 24, 2025

emersodb Jan 24, 2025

nerdai Jan 24, 2025

nerdai Jan 24, 2025 •

edited

Loading

emersodb left a comment

emersodb Jan 24, 2025

emersodb left a comment

nerdai commented Jan 24, 2025

		while True:
		# giving a smaller timeout here just in case it hangs for a long time waiting for a single log line

Move assert statements out of run_smoke_test and into the actual test (for graceful shutdown in case of failure) #318

Move assert statements out of run_smoke_test and into the actual test (for graceful shutdown in case of failure) #318

Conversation

nerdai commented Jan 23, 2025 • edited Loading

PR Type

Short Description

Tests Added

nerdai Jan 23, 2025

Choose a reason for hiding this comment

nerdai Jan 23, 2025 • edited Loading

Choose a reason for hiding this comment

nerdai Jan 24, 2025

Choose a reason for hiding this comment

nerdai Jan 24, 2025

Choose a reason for hiding this comment

emersodb Jan 24, 2025

Choose a reason for hiding this comment

nerdai Jan 24, 2025

Choose a reason for hiding this comment

nerdai Jan 24, 2025 • edited Loading

Choose a reason for hiding this comment

emersodb left a comment

Choose a reason for hiding this comment

emersodb Jan 24, 2025

Choose a reason for hiding this comment

emersodb left a comment

Choose a reason for hiding this comment

nerdai commented Jan 24, 2025

Move assert statements out of `run_smoke_test` and into the actual test (for graceful shutdown in case of failure) #318

Move assert statements out of `run_smoke_test` and into the actual test (for graceful shutdown in case of failure) #318

nerdai commented Jan 23, 2025 •

edited

Loading

nerdai Jan 23, 2025 •

edited

Loading

nerdai Jan 24, 2025 •

edited

Loading