Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[loki-distributed] Add Query Frontend to the memberlist and allow matchExpression on ServiceMonitor #2732

Merged
merged 13 commits into from
Nov 29, 2023

Conversation

verejoel
Copy link
Contributor

@verejoel verejoel commented Oct 24, 2023

Changes

Query Frontend joins the Memberlist

When using the query scheduler ring, the query frontend must join the memberlist. This is because the query frontend needs to know which members of the scheduler ring are healthy, and the scheduler will only send heartbeats on port 7946.

In the existing configuration, when the scheduler ring is enabled, the query frontend sees the schedulers get unhealthy periodically. Also, the regular heartbeat is not received every 15 seconds. This is because the query schedulers cannot write to port 7946 on the frontend:

level=warn ts=2023-10-24T09:53:48.014542227Z caller=tcp_transport.go:437 component="memberlist TCPTransport" msg="WriteTo failed" addr=10.32.38.5:7946 err="dial tcp 10.32.38.5:7946: i/o timeout"

Furthermore, the query schedulers mark the query frontends as unhealthy, as they receive no acks on the memberlist:

ts=2023-10-24T10:07:35.612955707Z caller=memberlist_logger.go:74 level=info msg="Suspect loki-query-frontend-7567ccd8f9-h2v8p-20cf87bb has failed, no acks received"

This PR introduces the required container port 7946 and memberlist selector label on the query frontend component. This configuration has been tested in a production environment, and confirmed to fix communication issues between the frontend, querier, and scheduler components.

ServiceMonitor matchExpression

Some ServiceMonitors had the following matchExpression stanza hardcoded:

  matchExpressions:
    - key: prometheus.io/service-monitor
      operator: NotIn
      values:
        - "false"

This has now been made a configurable value, which is empty by default. This ensures the ServiceMonitor configuration is predictable across all components.

Minor changes

The ingester deployment was missing some features of the StatefulSet. These have been copied across.

Signed-off-by: verejoel <[email protected]>
@verejoel verejoel changed the title [loki-distributed] Add query frontend to the memberlist [loki-distributed] Add query frontend to the memberlist and allow matchExpression on ServiceMonitor Oct 24, 2023
@verejoel verejoel changed the title [loki-distributed] Add query frontend to the memberlist and allow matchExpression on ServiceMonitor [loki-distributed] Add Query Frontend to the memberlist and allow matchExpression on ServiceMonitor Oct 24, 2023
@verejoel verejoel changed the title [loki-distributed] Add Query Frontend to the memberlist and allow matchExpression on ServiceMonitor [loki-distributed] Add Query Frontend to the memberlist and allow matchExpression on ServiceMonitor Oct 24, 2023
Signed-off-by: verejoel <[email protected]>
@verejoel verejoel force-pushed the feature/ingester-deployment branch from a829f69 to 425a9d1 Compare October 25, 2023 05:03
@verejoel
Copy link
Contributor Author

@zanhsieh thanks for the review - can this be merged?

@verejoel
Copy link
Contributor Author

@zalegrala would you also be able to review here please?

@z0rc
Copy link
Contributor

z0rc commented Nov 17, 2023

@grafana/loki-squad @unguiculus @Whyeasy need your review here.

@zalegrala please extend code owners list for this chart. Currently here is just 1 active maintainer.

Copy link
Contributor

@MichelHollands MichelHollands left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@MichelHollands MichelHollands merged commit 149761c into grafana:main Nov 29, 2023
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants