Fix: Do not cache native resources created without CommonLabels #1818
+41
−24
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I read a blog post on operator memory pitfalls mentioning
Owns()
being a footgun, which is used in thegrafana_reconciler
SetupWithManager.TLDR: By declaring
Owns()
or usingGet/List
you tell the thecontroller-runtime
to watch and cache all instances of theclient.Object
, which on large clusters could result in a lot ofConfigMaps
,Secrets
andDeployments
in the Grafana-Operators case.I expected this to be a problem due to the pprof profiles uploaded in #1622 which was verified by following the steps outlined below.
The post linked to an Operator SDK trick for configuring the
client.Object
cache with labels.I remembered that #1661 added common labels to resources created by the operator to reduce memory consumption.
Verifying cache issues:
kubectl port-forward -n grafana-operator-system deploy/grafana-operator-controller-manager-v5 8888 & go tool pprof -top -nodecount 20 http://localhost:8888/debug/pprof/heap
fallocate -l 393216 large_file
ConfigMaps
Current progress
Watching and caching has been limited to resources controlled by the operator of Kind:
if IsOpenShift
This is done with the existing
CommonLabels
selector introduced in #1661:app.kubernetes.io/managed-by: "grafana-operator"
Memory consumption in an empty kind cluster after ~1 minute1:
ConfigMaps
andSecrets
TODO:
ConfigMap/Secret
using label selectors, similar to WATCH_NAMESPACE_SELECTORPotentially a way to tune them individually.
Footnotes
Heap will increase over time as the operator stabilizes. ↩
The reduction is by no means representative of real deployments.
For clusters mixing the Grafana-Operator and other workloads in cluster scoped mode, the reduction is likely significantly higher.
Even if the Grafana-Operator was the only Deployment in a cluster, this should still reduce memory as it won't cache itself 😉 ↩