Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Operator calling the wrong pod endpoint when trying to scale down ingesters #125

Open
grecuionut opened this issue Jan 24, 2024 · 1 comment

Comments

@grecuionut
Copy link

We are trying to scale down ingesters using the MutatingAdmissionWebhook as described here.

The required labels and annotations were added to the objects as shown below:

# labels
grafana.com/prepare-downscale=true

# annotations
grafana.com/prepare-downscale-http-path=ingester/prepare-shutdown
grafana.com/prepare-downscale-http-port=8080

When trying the scale down the statefulset/mimir-ingester-zone-a, the operator failing to resolve the pod when sending HTTP post request, as the fqdn is constructed as <pod_name>.<service_name>.<namespace>.svc.cluster.local.

mimir-ingester-zone-a-1.mimir-ingester-zone-a.mimir.svc.cluster.local

These are the existing services:

mimir-ingester-headless                  ClusterIP   None             <none>        8080/TCP,9095/TCP   19d
mimir-ingester-zone-a                    ClusterIP   <ip_address>     <none>        8080/TCP,9095/TCP   19d
mimir-ingester-zone-b                    ClusterIP   <ip_address>    <none>        8080/TCP,9095/TCP   19d
mimir-ingester-zone-c                    ClusterIP   <ip_address>    <none>        8080/TCP,9095/TCP   19d

In order to resolve the pod, the headless service should be used instead (mimir-ingester-headless). More info

Operator logs

level=error ts=2024-01-16T13:29:51.838387358Z name=mimir-ingester-zone-a resource=statefulsets namespace=mimir request_gvk="autoscaling/v1, Kind=Scale" old_replicas=2 new_replicas=1 url=mimir-ingester-zone-a-1.mimir-ingester-zone-a.mimir.svc.cluster.local:443/ingester/prepare-shutdown index=1 msg="error sending HTTP post request" err="Post \"http://mimir-ingester-zone-a-1.mimir-ingester-zone-a.mimir.svc.cluster.local:443/ingester/prepare-shutdown\": dial tcp: lookup mimir-ingester-zone-a-1.mimir-ingester-zone-a.mimir.svc.cluster.local on <name_server_ip>:53: no such host"
@jhychan
Copy link

jhychan commented Sep 11, 2024

If I'm not mistaken, the generated service name for a statefulset comes from the serviceName field in the StatefulSet.

Having looked over a number of the Loki/Mimir/Tempo helm charts we currently use, many of them do not correctly template the serviceName against the appropriate headless Service.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants