Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix early stop returning errcode -15 #322

Open
wants to merge 1 commit into
base: staging
Choose a base branch
from

Conversation

satyaog
Copy link
Member

@satyaog satyaog commented Nov 25, 2024

Current log :

benchio.0 [message] Terminating process because it ran for longer than 1 seconds.
benchio.0[end (-15)] 'milabench/bin/voir' --config /tmp/extra/benchio/voirconf-benchio.0-384a97a8bb2d5d89323fc897d5a5d82e.json milabench/tests/yoshua-benchio/main.py --sleep 60 --start 1 --end 11 [at 2024-11-25 15:51:09.910958] benchio.0
=========
  * Error codes = -15
  * No traceback info about the error

instead of

benchio.0 [message] Terminating process because it ran for longer than 1 seconds.
benchio.0 [end] 'milabench/bin/voir' --config /tmp/extra/benchio/voirconf-benchio.0-384a97a8bb2d5d89323fc897d5a5d82e.json milabench/tests/yoshua-benchio/main.py --sleep 60 --start 1 --end 11 [at 2024-11-25 16:00:30.277804] benchio.0
=========
  * early stopped

benchio.0 [message] Terminating process because it ran for longer than 1 seconds.
benchio.0 [end (-15)] 'milabench/bin/voir' --config /tmp/extra/benchio/voirconf-benchio.0-384a97a8bb2d5d89323fc897d5a5d82e.json milabench/tests/yoshua-benchio/main.py --sleep 60 --start 1 --end 11 [at 2024-11-25 15:51:09.910958]
benchio.0
=========
  * Error codes = -15
  * No traceback info about the error

instead of

benchio.0 [message] Terminating process because it ran for longer than 1 seconds.
benchio.0 [end] 'milabench/bin/voir' --config /tmp/extra/benchio/voirconf-benchio.0-384a97a8bb2d5d89323fc897d5a5d82e.json milabench/tests/yoshua-benchio/main.py --sleep 60 --start 1 --end 11 [at 2024-11-25 16:00:30.277804]
benchio.0
=========
  * early stopped
@Delaunay
Copy link
Collaborator

Early stopped and -15 are different.

Early stopped means voir raised StopProgram to crash the benchmark because we don't need more samples;
that is an okay error to ignore, because milabench is causing it on purpose to stop the benchmark

-15 means sigterm was sent to the benchmark because the program ran longer than expected.
This is not okay to ignore because that means the benchmark might be hanging and that is unexpected behavior.
If the benchmark needs more time then its max_duration needs to be increase to reflect its expected runtime

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants