-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move assert statements out of run_smoke_test
and into the actual test (for graceful shutdown in case of failure)
#318
Conversation
AssertionError
)AssertionError
)
AssertionError
)run_smoke_test
and in the actual test (for graceful shutdown in case of failure)
run_smoke_test
and in the actual test (for graceful shutdown in case of failure)run_smoke_test
and into the actual test (for graceful shutdown in case of failure)
run_smoke_test
and into the actual test (for graceful shutdown in case of failure)run_smoke_test
and into the actual test (for graceful shutdown in case of failure)
@@ -50,7 +59,7 @@ async def run_smoke_test( | |||
client_metrics: dict[str, Any] | None = None, | |||
# assertion params | |||
tolerance: float = DEFAULT_TOLERANCE, | |||
) -> None: | |||
) -> tuple[list[str], list[str]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This now returns server_errors
and client_errors
to the caller.
@@ -201,16 +215,16 @@ async def run_smoke_test( | |||
break | |||
|
|||
return_code = server_process.returncode | |||
assert return_code is None or (return_code is not None and return_code == 0), ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to remove these assert
statements in way that has me not doing any dangerous Boolean algebra, I employ the following pattern. We can change this if we want...
# old
assert <cond>, <fail_msg>
# new
if not <cond>:
raise SmokeTestError(fail_msg)
933c7e7
to
157e18b
Compare
) | ||
task = asyncio.create_task(coro) | ||
await task | ||
except SmokeTestTimeoutError as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This culprit test will only attempt to retry if SmokeTestTimeoutError
is raised. Otherwise the fail is due to another, potentially real error and we fail the test in this case.
f"Full client output:\n{full_client_output}\n" | ||
f"[ASSERT ERROR] 'Client Evaluation Local Model Metrics' message not found for client {i}." | ||
) | ||
raise SmokeTestAssertError(msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tbh, I am not sure if I like the name SmokeTestAssertError
it makes me feel like its asserting on a good output against an expected value.
I think these are probably better as SmokeTestExecutionError
? I was just following our naming with [ASSERT ERROR] but I think this convention is confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm good with either. I'll leave it to you to decide 😂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm lazy now. I'm going to leave it as is lol.
while True: | ||
# giving a smaller timeout here just in case it hangs for a long time waiting for a single log line |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cleaned this up, but note that we weren't actually giving this "inner task" a smaller timeout.
Instead I outsource this logic to a contained method get_output_from_stdout()
which reads the stream until completion. This whole process of reading from the stream is what I assign a timeout for. (No more need for manual computation elapsed_time = datetime.datetime.now() - start_time
if timeout was reached)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some really minor comments. Otherwise, this looks awesome.
f"Full client output:\n{full_client_output}\n" | ||
f"[ASSERT ERROR] 'Client Evaluation Local Model Metrics' message not found for client {i}." | ||
) | ||
raise SmokeTestAssertError(msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm good with either. I'll leave it to you to decide 😂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to go!
Smoke tests are not being torpedoed anymore with graceful shutdown and cleaning up tasks. In another PR we may want to address this flaky test or at least give it a different mark/ci-job so that we don't have to re-run all other smoke tests. Retry logic doesn't seem to be working for some reason -- could look into that after... |
PR Type
Fix
Short Description
THIS PR BROUGHT THE SMOKE 💨 💨 💨
This PR addresses an issue with our smoke tests where if one fails (the usual culprit being
test_client_level_dp_breast_cancer
) then it torpedos all of the remaining tests that follow it. Moreover, the error logs of the job looked super scary!To address this issue, this PR:
Exception
is encountered during the test.In addition to the above:
run_smoke_test
to the actual testing moduletest_smoke_test.py
and "within" the test themselves directly. This is mostly for hygiene to follow typical pytest conventions which makes it easier to see why a test might fail and explicitly make it fail if an exception is raised.NOTE:
If
test_client_level_dp_breast_cancer
fails, it will still fail the overall smoke test job which mean's we'll need to run all of the tests even passing ones again. In another PR we could mark this test with something like "flaky" and then create a separate job for flaky tests and so re-running these would be less time consuming.Tests Added
N/A