-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
the best pastebin that ever was or will be #62
Comments
Application monitoring is kind of already addressed in the Nagios monitors issue we have open. Going to work on that on my next break probably.
We could set up a pair of haproxy instances, each in separate locations / with separate providers. Each instance can balance between two backends that share a DB. When we deploy new code we can do it to a pair at a time, test it, and then put it back in rotation. This gives us quite a bit of redundancy, and flexibility to break things without people noticing.
I agree, Datacate has gone down several times. Between the two of us, we have access to two separate providers, I'm sure we can set something up that providers geographic redundancy and provider redundancy. Maybe have one balancer and its backends + DB on one provider, the second on another? |
HAProxy 1.5 looks pretty cool--even does IPv6, despite some people telling me otherwise. However, we can do this for free with Varnish: https://www.varnish-cache.org/docs/4.0/reference/vcl.html#backend-definition Examples: https://www.varnish-cache.org/docs/4.0/users-guide/vcl-backends.html#directors Might as well make fewer stages in the request pipeline than more--I wouldn't be surprised if Varnish is faster anyway. |
Sounds good, fire |
A fair amount of this coversation ties in with @lericah's issue #101 |
One of the most critical things about a pastebin is that it must always work, even through the zombie apocalypse.
People want to put their shit up on a pastebin yesterday, not fuck around with 6 alternatives trying to find the one that happens to be working now. We need to deliver on that, and be the pastebin that everyone uses because it is the only pastebin that works when all of the others fail.
There are a few considerations:
application monitoring
If nothing else fails, one inevitability is that we will run out of paste IDs or disk space or both. We should make sure that this never gets in the way of people being able to make new pastes--this should be actively monitored and actioned upon proactively, and not only after someone reports actual breakage.
stack high availability
Even the brief time it takes to deploy code and restart gunicorn is too long, let alone any possible un-planned breakage should be treated like people die for every second ptpb is unavailable. We should decide how we want to handle this: for code deployments, we should have an automated system that drains all incoming connections, and shifts load to elsewhere. We should decide the exact mechanisms we want to use to accomplish this.
network uptime
Despite my mixed feelings about Datacate, it's a serious problem if our/their infrastructure goes down, which it has at least twice since September 2014, both times for at least an hour. We should have ptpb deployed in multiple geographically separate locations, and have automatic failover and monitoring mechanisms to prepare for the catastrophic failure of an entire datacenter.
The text was updated successfully, but these errors were encountered: