Support for high availability redundancy #1015

lornasong · 2022-08-02T20:23:32Z

Description

Currently CTS does not natively support high availability. When CTS becomes unavailable, it relies on an external system or process to intervene and restart it.

We want to support redundancy in CTS by allowing multiple CTS instances to form a cluster of 1 leader and any number of followers. The leader instance would be responsible for executing tasks. The follower instances would be backups that are ready to take over task execution when the leader becomes unavailable. This creates a reliable failover that minimizes the time that the network infrastructure out-of-date and no longer requires external intervention.

Use Cases

Keeping network infrastructure up-to-date is mission critical for application delivery. Without high availability, CTS can become the single point of failure for network automation workflows that it is responsible for.

Alternative Solutions

Currently, there are workarounds that minimize CTS downtime, such as using an orchestrator like Nomad or Kubernetes. However, this puts the burden on users to make CTS more highly available.

Additional context

Some highly available systems distribute load amongst cluster members. For example, CTS followers could potentially share task execution responsibilities. This would be a task distribution type feature and is separate enhancement from the redundancy backup feature described in this issue

New Cluster Status Endpoint

To support high availability, a new API endpoint will be added, GET /status/cluster. This will allow users to get information about the members in the CTS cluster, including health information and leadership status.

The text was updated successfully, but these errors were encountered:

lornasong added enhancement New feature or request enterprise Feature will be delivered as a part of Enterprise binary labels Aug 2, 2022

lornasong added this to the v0.7.0 milestone Aug 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for high availability redundancy #1015

Support for high availability redundancy #1015

lornasong commented Aug 2, 2022 •

edited by wilkermichael

Loading

Support for high availability redundancy #1015

Support for high availability redundancy #1015

Comments

lornasong commented Aug 2, 2022 • edited by wilkermichael Loading

Description

Use Cases

Alternative Solutions

Additional context

New Cluster Status Endpoint

lornasong commented Aug 2, 2022 •

edited by wilkermichael

Loading