Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace probe endpoint #13

Conversation

sebastian-luna-valero
Copy link
Member

Summary

I would like to replace the current check to https://<url>/services/status/ with https://<url>/hub/metrics

See: https://jupyterhub.readthedocs.io/en/3.1.1/reference/monitoring.html

It has been tested with:

egi-notebooks-probe --host jupyterhub.d4science.org
egi-notebooks-probe --host notebooks.egi.eu
egi-notebooks-probe --host replay.notebooks.egi.eu

Related issue :

@sebastian-luna-valero
Copy link
Member Author

Not sure why this happens with pylint after 298f9c4:

  nagios_plugins_egi_notebooks/status.py:5:0: E0401: Unable to import 'requests' (import-error)
  nagios_plugins_egi_notebooks/status.py:6:0: E0401: Unable to import 'requests.exceptions' (import-error)

@enolfc
Copy link
Collaborator

enolfc commented Jun 20, 2024

Not sure why this happens with pylint after 298f9c4:

  nagios_plugins_egi_notebooks/status.py:5:0: E0401: Unable to import 'requests' (import-error)
  nagios_plugins_egi_notebooks/status.py:6:0: E0401: Unable to import 'requests.exceptions' (import-error)

pylint is super picky about everything and I have disabled in other repos :(
or at least the import-error check could be disabled

@enolfc
Copy link
Collaborator

enolfc commented Jun 20, 2024

Summary

I would like to replace the current check to https://<url>/services/status/ with https://<url>/hub/metrics

See: https://jupyterhub.readthedocs.io/en/3.1.1/reference/monitoring.html

It has been tested with:

egi-notebooks-probe --host jupyterhub.d4science.org
egi-notebooks-probe --host notebooks.egi.eu
egi-notebooks-probe --host replay.notebooks.egi.eu

There are some things to consider here before moving towards this change:

  1. the /metrics is exposing info that could be considered sensitive and we may need to close it in the future (already thought about it before), so the probe would not work
  2. the probe does actually check the spawning of a notebook server, which may find problems that the /metrics endpoint may not

We may want to go for a more generic testing, avoiding the spawning of the server and checking its status and avoiding going for the internal metrics with /health that should return 200 if things are good.

Another option is to use the hub API to actually trigger the spawning of a notebook server which is even closer to the user behaviour.

@sebastian-luna-valero
Copy link
Member Author

Great feedback, many thanks!

I found the below examples to start/stop servers with the JupyterHub API:

First question, which user should we use for these tests?

@enolfc
Copy link
Collaborator

enolfc commented Jun 20, 2024

Great feedback, many thanks!

I found the below examples to start/stop servers with the JupyterHub API:

First question, which user should we use for these tests?

There is an implementation already of starting/stopping a server
https://github.com/EGI-Federation/egi-notebooks-monitoring/blob/master/worker/monitor.py

On the user, we should use the one that ARGO uses, although we are not yet fully ready to use Check-in token as authentication (but will soon, see EGI-Federation/egi-notebooks-hub#119). With that one in place we can move the logic to nagios and simplify the deployment of notebooks.

@enolfc
Copy link
Collaborator

enolfc commented Jun 20, 2024

I propose to close this one and keep track of the improvement with this issue #14.

@sebastian-luna-valero
Copy link
Member Author

Sure, many thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants