Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

database: if all clients fail to connect, emit "disconnect" event #8091

Merged
merged 1 commit into from
Jan 3, 2025

Conversation

haraldschilly
Copy link
Contributor

@haraldschilly haraldschilly commented Jan 3, 2025

This is related to #5997 … i.e. this records the case where there is no connection to the database, to eventually cause the health check to fail. This is more general than the "forgot the database password"-problem, because it should cause a health check failure in all cases, when it is not possible to establish a connection to the database.

test: I haven't tested this, but I think it should be fine. In the end, the misc.retry_until_success will call this @_connect again and again and the health checks last disconnect timestamp will be set once and kept at the point in time when this failure happened for the first time.


This really needed more work. The weird part is, when there is a "disconnect" event, it should be emitted and received by that record-connect-error, but the second time around (connect-disconnect-connect-disconnect), it no longer receives the event. Somewhere, everyone who listens to the event emitter is removed, or just the "disconnect" event, and never re-establishes the connection. I saw there is code in the syncstrings part, which could be relevant – not sure.

In any case, my fix is to call the register connect/disconnect functions directly. A disconnect happens either when the database client is disconnected or if the attempt to connect fails.

My test in my local dev environment was to start/stop the database and check what the health check has to say, i.e.

$ curl http://localhost:5000/healthcheck
healthchecks:
concurrent 0 < undefined – OK
uptime 1m40s – draining in 24m49s – terminating in 25m49s – OK
DB problems for 0.14 < 5 mins – OK

vs.

$ curl http://localhost:5000/healthcheck
healthchecks:
concurrent 0 < undefined – OK
uptime 1m41s – draining in 24m48s – terminating in 25m48s – OK
no DB connection problems – OK

flipping back and forth when starting/stopping my local db.

@haraldschilly haraldschilly added PR-needs review PR-needs testing work done, only testing left to do labels Jan 3, 2025
@haraldschilly haraldschilly marked this pull request as ready for review January 3, 2025 17:16
@haraldschilly haraldschilly force-pushed the db-all-fail-emit-disconnect branch 2 times, most recently from bfb1103 to 2b84293 Compare January 3, 2025 18:32
@haraldschilly haraldschilly force-pushed the db-all-fail-emit-disconnect branch from 2b84293 to 3dd43fc Compare January 3, 2025 18:33
@haraldschilly haraldschilly added PR-needs review and removed PR-needs work PR-needs testing work done, only testing left to do labels Jan 3, 2025
@williamstein williamstein merged commit e8a605c into master Jan 3, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants