Skip to content

Commit

Permalink
feat(scale-agent): doc scale agent horizontal scaling feature
Browse files Browse the repository at this point in the history
  • Loading branch information
Adan Urban Reyes authored and Adan Urban Reyes committed Dec 6, 2023
1 parent 91dd0ac commit 1898611
Show file tree
Hide file tree
Showing 3 changed files with 177 additions and 4 deletions.
122 changes: 122 additions & 0 deletions content/en/plugins/scale-agent/concepts/horizontal-scaling.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
---
title: Horizontal Scaling Architecture and Features
linkTitle: Horizontal Scaling
description: >
Learn how the Horizontal Scaling feature helps by distributing operations across Armory Scale Agent replicas in your Armory Continuous Deployment or Spinnaker environment.
aliases:
- /scale-agent/tasks/horizontal-scaling/
---

## Overview of Horizontal Scaling

Check warning on line 10 in content/en/plugins/scale-agent/concepts/horizontal-scaling.md

View workflow job for this annotation

GitHub Actions / vale

[vale] content/en/plugins/scale-agent/concepts/horizontal-scaling.md#L10

[Google.Headings] 'Overview of Horizontal Scaling' should use sentence-style capitalization.
Raw output
{"message": "[Google.Headings] 'Overview of Horizontal Scaling' should use sentence-style capitalization.", "location": {"path": "content/en/plugins/scale-agent/concepts/horizontal-scaling.md", "range": {"start": {"line": 10, "column": 4}}}, "severity": "WARNING"}

Rather than sending operations to the first Scale Agent instance that could handle it, horizontal Scaling provides a way to improve operations by distributing them across all the Scale Agent replicas that could handle it.

### How to enable and use Horizontal Scaling

Check warning on line 14 in content/en/plugins/scale-agent/concepts/horizontal-scaling.md

View workflow job for this annotation

GitHub Actions / vale

[vale] content/en/plugins/scale-agent/concepts/horizontal-scaling.md#L14

[Google.Headings] 'How to enable and use Horizontal Scaling' should use sentence-style capitalization.
Raw output
{"message": "[Google.Headings] 'How to enable and use Horizontal Scaling' should use sentence-style capitalization.", "location": {"path": "content/en/plugins/scale-agent/concepts/horizontal-scaling.md", "range": {"start": {"line": 14, "column": 5}}}, "severity": "WARNING"}

First, familiarize yourself with the architecture and features in this guide. Then you can:

1. {{< linkWithTitle "plugins/scale-agent/tasks/horizontal-scaling/operations-enable.md" >}}

## Horizontal Scaling glossary

Check warning on line 20 in content/en/plugins/scale-agent/concepts/horizontal-scaling.md

View workflow job for this annotation

GitHub Actions / vale

[vale] content/en/plugins/scale-agent/concepts/horizontal-scaling.md#L20

[Google.Headings] 'Horizontal Scaling glossary' should use sentence-style capitalization.
Raw output
{"message": "[Google.Headings] 'Horizontal Scaling glossary' should use sentence-style capitalization.", "location": {"path": "content/en/plugins/scale-agent/concepts/horizontal-scaling.md", "range": {"start": {"line": 20, "column": 4}}}, "severity": "WARNING"}

- **K8s Operation**: an abstraction of a K8s operation; Get, List, Add, Delete, Patch etc.

Check failure on line 22 in content/en/plugins/scale-agent/concepts/horizontal-scaling.md

View workflow job for this annotation

GitHub Actions / vale

[vale] content/en/plugins/scale-agent/concepts/horizontal-scaling.md#L22

[Google.Units] Put a nonbreaking space between the number and the unit in '8s'.
Raw output
{"message": "[Google.Units] Put a nonbreaking space between the number and the unit in '8s'.", "location": {"path": "content/en/plugins/scale-agent/concepts/horizontal-scaling.md", "range": {"start": {"line": 22, "column": 6}}}, "severity": "ERROR"}
- **Dynamic account Operation**: an abstraction of a dynamic account operation; Add or Unregister accounts
- **Endpoint**: the URL segment after the Clouddriver root
- **Request**: an instruction that isn’t fulfilled immediately and can have different outcomes; a request can be done through HTTP by the admin or internally by one of the services.

Check warning on line 25 in content/en/plugins/scale-agent/concepts/horizontal-scaling.md

View workflow job for this annotation

GitHub Actions / vale

[vale] content/en/plugins/scale-agent/concepts/horizontal-scaling.md#L25

[Google.WordList] Use 'administrator' instead of 'admin'.
Raw output
{"message": "[Google.WordList] Use 'administrator' instead of 'admin'.", "location": {"path": "content/en/plugins/scale-agent/concepts/horizontal-scaling.md", "range": {"start": {"line": 25, "column": 139}}}, "severity": "WARNING"}

## Architecture

First is important to understand the main difference between K8s operations and Dynamic account operations.

Check failure on line 29 in content/en/plugins/scale-agent/concepts/horizontal-scaling.md

View workflow job for this annotation

GitHub Actions / vale

[vale] content/en/plugins/scale-agent/concepts/horizontal-scaling.md#L29

[Google.Units] Put a nonbreaking space between the number and the unit in '8s'.
Raw output
{"message": "[Google.Units] Put a nonbreaking space between the number and the unit in '8s'.", "location": {"path": "content/en/plugins/scale-agent/concepts/horizontal-scaling.md", "range": {"start": {"line": 29, "column": 63}}}, "severity": "ERROR"}

|K8s |Dynamic account |

Check failure on line 31 in content/en/plugins/scale-agent/concepts/horizontal-scaling.md

View workflow job for this annotation

GitHub Actions / vale

[vale] content/en/plugins/scale-agent/concepts/horizontal-scaling.md#L31

[Google.Units] Put a nonbreaking space between the number and the unit in '8s'.
Raw output
{"message": "[Google.Units] Put a nonbreaking space between the number and the unit in '8s'.", "location": {"path": "content/en/plugins/scale-agent/concepts/horizontal-scaling.md", "range": {"start": {"line": 31, "column": 3}}}, "severity": "ERROR"}
|---------------------------------------------|-------------------------------------------------------|
|Are handled by a single Scale Agent Instance |Could be handled by more than one Scale Agent Instance |
|Are processed on every polling cycle |Are processed on demand |


The Scale Agent stores K8s and Dynamic Account operations data in dedicated tables that act like a queue:

Check failure on line 37 in content/en/plugins/scale-agent/concepts/horizontal-scaling.md

View workflow job for this annotation

GitHub Actions / vale

[vale] content/en/plugins/scale-agent/concepts/horizontal-scaling.md#L37

[Google.Units] Put a nonbreaking space between the number and the unit in '8s'.
Raw output
{"message": "[Google.Units] Put a nonbreaking space between the number and the unit in '8s'.", "location": {"path": "content/en/plugins/scale-agent/concepts/horizontal-scaling.md", "range": {"start": {"line": 37, "column": 25}}}, "severity": "ERROR"}
- `clouddriver.kubesvc_operation`: Has the information of new received operations
- `clouddriver.kubesvc_operation_single_assign`: Has the information of K8s operations that could be assigned just to a single Scale Agent Instance

Check failure on line 39 in content/en/plugins/scale-agent/concepts/horizontal-scaling.md

View workflow job for this annotation

GitHub Actions / vale

[vale] content/en/plugins/scale-agent/concepts/horizontal-scaling.md#L39

[Google.Units] Put a nonbreaking space between the number and the unit in '8s'.
Raw output
{"message": "[Google.Units] Put a nonbreaking space between the number and the unit in '8s'.", "location": {"path": "content/en/plugins/scale-agent/concepts/horizontal-scaling.md", "range": {"start": {"line": 39, "column": 74}}}, "severity": "ERROR"}
- `clouddriver.kubesvc_operation_multiple_assign`: Has the information of dynamic account operations that could be assigned to multiple Scale Agent Instances
- `clouddriver.kubesvc_operation_history`: Has the information of K8s and dynamic account operations responses

Check failure on line 41 in content/en/plugins/scale-agent/concepts/horizontal-scaling.md

View workflow job for this annotation

GitHub Actions / vale

[vale] content/en/plugins/scale-agent/concepts/horizontal-scaling.md#L41

[Google.Units] Put a nonbreaking space between the number and the unit in '8s'.
Raw output
{"message": "[Google.Units] Put a nonbreaking space between the number and the unit in '8s'.", "location": {"path": "content/en/plugins/scale-agent/concepts/horizontal-scaling.md", "range": {"start": {"line": 41, "column": 68}}}, "severity": "ERROR"}

### K8s Operations

Check warning on line 43 in content/en/plugins/scale-agent/concepts/horizontal-scaling.md

View workflow job for this annotation

GitHub Actions / vale

[vale] content/en/plugins/scale-agent/concepts/horizontal-scaling.md#L43

[Google.Headings] 'K8s Operations' should use sentence-style capitalization.
Raw output
{"message": "[Google.Headings] 'K8s Operations' should use sentence-style capitalization.", "location": {"path": "content/en/plugins/scale-agent/concepts/horizontal-scaling.md", "range": {"start": {"line": 43, "column": 5}}}, "severity": "WARNING"}

Check failure on line 43 in content/en/plugins/scale-agent/concepts/horizontal-scaling.md

View workflow job for this annotation

GitHub Actions / vale

[vale] content/en/plugins/scale-agent/concepts/horizontal-scaling.md#L43

[Google.Units] Put a nonbreaking space between the number and the unit in '8s'.
Raw output
{"message": "[Google.Units] Put a nonbreaking space between the number and the unit in '8s'.", "location": {"path": "content/en/plugins/scale-agent/concepts/horizontal-scaling.md", "range": {"start": {"line": 43, "column": 6}}}, "severity": "ERROR"}

The Scale Agent Plugin creates a job per Scale Agent Instance registration, this job is in charge of:
1. Fetching pending K8s operations from `clouddriver.kubesvc_operation` table

Check failure on line 46 in content/en/plugins/scale-agent/concepts/horizontal-scaling.md

View workflow job for this annotation

GitHub Actions / vale

[vale] content/en/plugins/scale-agent/concepts/horizontal-scaling.md#L46

[Google.Units] Put a nonbreaking space between the number and the unit in '8s'.
Raw output
{"message": "[Google.Units] Put a nonbreaking space between the number and the unit in '8s'.", "location": {"path": "content/en/plugins/scale-agent/concepts/horizontal-scaling.md", "range": {"start": {"line": 46, "column": 22}}}, "severity": "ERROR"}
2. Assigning pending K8s operations on clouddriver.kubesvc_operation_single_assign table

Check failure on line 47 in content/en/plugins/scale-agent/concepts/horizontal-scaling.md

View workflow job for this annotation

GitHub Actions / vale

[vale] content/en/plugins/scale-agent/concepts/horizontal-scaling.md#L47

[Google.Units] Put a nonbreaking space between the number and the unit in '8s'.
Raw output
{"message": "[Google.Units] Put a nonbreaking space between the number and the unit in '8s'.", "location": {"path": "content/en/plugins/scale-agent/concepts/horizontal-scaling.md", "range": {"start": {"line": 47, "column": 23}}}, "severity": "ERROR"}
3. Fetch assigned K8s operations from `clouddriver.kubesvc_operation_single_assign` table and send it to Scale Agent

Check failure on line 48 in content/en/plugins/scale-agent/concepts/horizontal-scaling.md

View workflow job for this annotation

GitHub Actions / vale

[vale] content/en/plugins/scale-agent/concepts/horizontal-scaling.md#L48

[Google.Units] Put a nonbreaking space between the number and the unit in '8s'.
Raw output
{"message": "[Google.Units] Put a nonbreaking space between the number and the unit in '8s'.", "location": {"path": "content/en/plugins/scale-agent/concepts/horizontal-scaling.md", "range": {"start": {"line": 48, "column": 20}}}, "severity": "ERROR"}

Some important thing to know about it, is that when getting a bad operation response and there is still time to do a retry (based on `kubesvc.cache.operationWaitMs` property), the Scale Agent Plugin does the following:
The Scale Agent Plugin does:
1. Stored the response on `clouddriver.kubesvc_operation_history` table
2. Unassigns the operation from `clouddriver.kubesvc_operation_single_assign` table, so that another or the same Scale Agent instance can take it again

```mermaid
C4Deployment
title Scale Agent Horizontal Scaling Registration Jobs
Boundary(spin, "Armory Continuous Deployment or Spinnaker", "Instance", $borderColor="#0FC2C0") {
Boundary(cd, "Clouddriver", "Service", $borderColor="orange") {
System(sap, "Scale Agent Plugin<br/>", "For each registration creates a job to assign and send<br/>every N milliseconds the maximum number of K8s operations.<br/><br/>N = kubesvc.operations.database.scan.initialDelay | maxDelay<br/>maximum number = kubesvc.operations.database.scan.batchSize")
System(saj0, "Scale Agent Job 0", "")
System(saj1, "Scale Agent Job 1", "")
System(saj2, "Scale Agent Job 2", "")
UpdateElementStyle(saj0, $bgColor="#04AA6D", $borderColor="none")
UpdateElementStyle(saj1, $bgColor="#f44336", $borderColor="none")
UpdateElementStyle(saj2, $bgColor="#555555", $borderColor="none")
}
Boundary(sa, "Armory Scale Agent", "Service", $borderColor="purple") {
System(sar0, "Replica 0", "")
System(sar1, "Replica 1", "")
System(sar2, "Replica 2", "")
UpdateElementStyle(sar0, $bgColor="#04AA6D", $borderColor="none")
UpdateElementStyle(sar1, $bgColor="#f44336", $borderColor="none")
UpdateElementStyle(sar2, $bgColor="#555555", $borderColor="none")
}
Rel(sar0, sap, "Registration", "")
UpdateRelStyle(sar0, sap, $textColor="black", $lineColor="#04AA6D")
Rel(sar1, sap, "Registration", "")
UpdateRelStyle(sar1, sap, $textColor="black", $lineColor="#f44336")
Rel(sar2, sap, "Registration", "")
UpdateRelStyle(sar2, sap, $textColor="black", $lineColor="#555555")
Rel(sap, saj0, "Create")
UpdateRelStyle(sap, saj0, $textColor="black", $lineColor="#04AA6D")
Rel(sap, saj1, "Create")
UpdateRelStyle(sap, saj1, $textColor="black", $lineColor="#f44336", $offsetX="-30", $offsetY="55")
Rel(sap, saj2, "Create")
UpdateRelStyle(sap, saj2, $textColor="black", $lineColor="#555555", $offsetX="-60", $offsetY="155")
BiRel(sar0, saj0, "HandleOp", "request/response")
UpdateRelStyle(sar0, saj0, $textColor="black", $lineColor="#04AA6D", $offsetX="-100", $offsetY="30")
BiRel(sar1, saj1, "HandleOp", "request/response")
UpdateRelStyle(sar1, saj1, $textColor="black", $lineColor="#f44336")
BiRel(sar2, saj2, "HandleOp", "request/response")
UpdateRelStyle(sar2, saj2, $textColor="black", $lineColor="#555555")
}
UpdateLayoutConfig($c4ShapeInRow="1", $c4BoundaryInRow="2")
```

### Dynamic account Operations

Check warning on line 98 in content/en/plugins/scale-agent/concepts/horizontal-scaling.md

View workflow job for this annotation

GitHub Actions / vale

[vale] content/en/plugins/scale-agent/concepts/horizontal-scaling.md#L98

[Google.Headings] 'Dynamic account Operations' should use sentence-style capitalization.
Raw output
{"message": "[Google.Headings] 'Dynamic account Operations' should use sentence-style capitalization.", "location": {"path": "content/en/plugins/scale-agent/concepts/horizontal-scaling.md", "range": {"start": {"line": 98, "column": 5}}}, "severity": "WARNING"}

Since dynamic account operations requests are less usual, the Scale Agent Plugin flow is as follows:

1. Receive and store the new dynamic account operation on `clouddriver.kubesvc_operation` table
2. Assign the dynamic account operation on `clouddriver.kubesvc_operation_multiple_assign` table; it could be assigned to all connected Scale Agent instance or to instances with the recived zoneId
3. Notify to all instances to fetch pending dynamic account operations from `clouddriver.kubesvc_operation_multiple_assign` table
4. Each instance reads and sends pending dynamic account operations to Scale Agent
5. Wait and send the response back

```mermaid
sequenceDiagram
actor User
participant Plugin
participant Service
User->>Plugin: Send dynamic account operation
Plugin->>Plugin: Store in clouddriver.kubesvc_operation
Plugin->>Plugin: Assign on clouddriver.kubesvc_operation_multiple_assign
Plugin->>Plugin: Notify all to read and send pending operations
Plugin->>Service: gRPC HandleOp
Service-->>Plugin: return
Plugin->>Plugin: Store response in clouddriver.kubesvc_operation_history
Plugin-->>User: return
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
---
title: Enable and Configure Operations Horizontal Scaling in the Armory Scale Agent
linkTitle: Enable Operations Horizontal Scaling
description: >
Learn how to enable and configure the Operations Horizontal Scaling feature in Armory Scale Agent for Spinnaker and Kubernetes.
---

## {{% heading "prereq" %}}

Check warning on line 8 in content/en/plugins/scale-agent/tasks/horizontal-scaling/operations-enable.md

View workflow job for this annotation

GitHub Actions / vale

[vale] content/en/plugins/scale-agent/tasks/horizontal-scaling/operations-enable.md#L8

[Google.Headings] '{{% heading "prereq" %}}' should use sentence-style capitalization.
Raw output
{"message": "[Google.Headings] '{{% heading \"prereq\" %}}' should use sentence-style capitalization.", "location": {"path": "content/en/plugins/scale-agent/tasks/horizontal-scaling/operations-enable.md", "range": {"start": {"line": 8, "column": 4}}}, "severity": "WARNING"}

* You are familiar with {{< linkWithTitle "plugins/scale-agent/concepts/horizontal-scaling" >}}.

## Scale Agent plugin

Check warning on line 12 in content/en/plugins/scale-agent/tasks/horizontal-scaling/operations-enable.md

View workflow job for this annotation

GitHub Actions / vale

[vale] content/en/plugins/scale-agent/tasks/horizontal-scaling/operations-enable.md#L12

[Google.Headings] 'Scale Agent plugin' should use sentence-style capitalization.
Raw output
{"message": "[Google.Headings] 'Scale Agent plugin' should use sentence-style capitalization.", "location": {"path": "content/en/plugins/scale-agent/tasks/horizontal-scaling/operations-enable.md", "range": {"start": {"line": 12, "column": 4}}}, "severity": "WARNING"}

> Operations Horizontal Scaling was introduce starting with plugin versions v0.13.20/0.12.21/0.11.56.
You should enable Operations Horizontal Scaling by setting `kubesvc.cluster: database` in your plugin configuration. For example:

{{< highlight bash "linenos=table,hl_lines=27-28">}}
spec:
spinnakerConfig:
profiles:
clouddriver:
spinnaker:
extensibility:
repositories:
armory-agent-k8s-spinplug-releases:
enabled: true
url: https://raw.githubusercontent.com/armory-io/agent-k8s-spinplug-releases/master/repositories.json
plugins:
Armory.Kubesvc:
enabled: true
version: 0.13.20 # Replace with a version compatible with your Armory CD version
extensions:
armory.kubesvc:
enabled: true
# Plugin config
kubesvc:
cluster: database
operations:
database:
scan:
batchSize: <int> # (Optional) # requires kubesvc.cluster: database be enable
initialDelay:<int> # (Optional) # requires kubesvc.cluster: database be enable
maxDelay:<int> # (Optional) # requires kubesvc.cluster: database be enable
{{< /highlight >}}

`operations.database.scan`:

* **batchSize**: (Optional) default: 5; The max number of operations that could be assigned to an Scale Agent instance per cycle
* **initialDelay**: (Optional) default: 250; Milliseconds to wait per cycle, when there are pending operations
* **maxDelay**: (Optional) default: 2000; Milliseconds to wait per cycle, when there are not pending operations
8 changes: 4 additions & 4 deletions static/csv/agent/agent-plugin-config-options.csv
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Setting|Type|Default|Description
<code>kubesvc.cache.namespaceExpiryMinutes</code>|integer|0|Disabled by default, set it to a value greater than 0 to enable. Specifies minutes to keep namespace definitions in memory to reduce calls to the database.
<code>kubesvc.cache.onDemandQuickWaitMs</code>|integer|10000|How long to wait for a recache operation.
<code>kubesvc.cache.operationWaitMs</code>|integer|30000|How long to wait for a Kubernetes operation like deploy, scale, delete, or others
<code>kubesvc.cluster</code>|string|none|Type of clustering.<br><code>local</code>: for development only; don’t try to coordinate with other Clouddriver instances<br><code>redis</code>: use Redis to coordinate via pubsub. Redis will be deprecated in a future release.<br><span class='badge badge-primary'>0.10.24+</span><span class='badge badge-primary'>0.9.40</span><span class='badge badge-primary'>0.8.48</span> <code>kubernetes</code>:(Recommended) Requires additional <code>cluster-kubernetes</connected> configuration.
<code>kubesvc.cluster</code>|string|none|Type of clustering.<br><code>local</code>: for development only; don’t try to coordinate with other Clouddriver instances<br><code>redis</code>: use Redis to coordinate via pubsub. Redis will be deprecated in a future release.<br><span class='badge badge-primary'>0.10.24+</span><span class='badge badge-primary'>0.9.40</span><span class='badge badge-primary'>0.8.48</span> <code>kubernetes</code>:(Recommended) Requires additional <code>cluster-kubernetes</code> configuration.<br><span class='badge badge-primary'>0.13.19+</span><span class='badge badge-primary'>0.12.20+</span><span class='badge badge-primary'>0.11.56+</span> <code>database</code>: Makes database act like a queue to coordinate, improves operations distribution, requires additional <code>operations.database.scan</connected> configuration.
<code>kubesvc.cluster-kubernetes.kubeconfigFile</code><br><code>kubesvc.cluster-kubernetes.verifySsl</code><br><code>kubesvc.cluster-kubernetes.namespace</code><br><code>kubesvc.cluster-kubernetes.httpPortName</code><br><code>kubesvc.cluster-kubernetes.clouddriverServiceNamePrefix</code>|string<br>boolean<br>string<br>string<br>string<br>|null<br>true<br>null<br>http<br>spin-clouddriver|(Optional) If configured, the plugin uses this file to discover Endpoints. If not configured, it will use the service account mounted to the pod.<br>(Optional) Whether to verify the Kubernetes API cert or not.<br>(Optional) If configured, the plugin watches Endpoints in this namespace. If null, it watches endpoints in the namespace indicated in the file <code>/var/run/secrets/kubernetes.io/serviceaccount/namespace</code><br>(Optional) Name of the port configured in clouddriver Service that forwards traffic to clouddriver http port for REST requests.<br>(Optional) Name prefix of the Kubernetes Service pointing to the Clouddriver standard HTTP port.
<code>kubesvc.credentials.poller.reloadFrequencyMs</code>|long|30000|<span class='badge badge-primary'>2.23.0+</span> <span class='badge badge-primary'>1.23.0+</span> How often the plugin will refresh account credentials to clouddriver in case <code>credentials.poller.enabled</code> is disabled. Otherwise the standard properties of <code>credentials.poller.enabled</code> and <code>credentials.poller.types.kubernetes.reloadFrequencyMs</code> are respected
<code>kubesvc.disableV2Provider</code>|boolean|false|If you don’t need the V2 provider account, set that to true to speed up caching deserialization.
Expand Down Expand Up @@ -41,6 +41,6 @@ Setting|Type|Default|Description
<code>kubesvc.v2-cache-eviction.batch-size</code>|integer|5|<span class='badge badge-primary'>0.10.3+</span> How many Kubernetes kinds to evict for each eviction event.
<code>kubesvc.v2-cache-eviction.millis</code>|integer|200|<span class='badge badge-primary'>0.10.3+</span> The time between evictions in milliseconds. Using a low value can lead to a spike in resource usage when migration occurs.
<code>kubesvc.ops.processTime.metric.result.maxLength</code>|integer|255|How many characters as a maximum could have the <code>kubesvc.ops.processTime.result</code> attribute metric



<code>kubesvc.operations.database.scan.batchSize</code>|integer|5|The max number of operations that could be assigned to an Scale Agent instance per cycle
<code>kubesvc.operations.database.scan.initialDelay</code>|integer|250|Milliseconds to wait per cycle, when there are pending operations
<code>kubesvc.operations.database.scan.maxDelay</code>|integer|2000|Milliseconds to wait per cycle, when there are not pending operations

0 comments on commit 1898611

Please sign in to comment.