Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Fix Some Typo #15534

Merged
merged 3 commits into from
Jan 6, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 8 additions & 8 deletions docs/sources/get-started/labels/_index.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
---
menuTitle: Labels
title: Understand labels
description: Explains how to Loki uses labels to define log streams.
description: Explains how Loki uses labels to define log streams.
weight: 600
aliases:
- ../getting-started/labels/
- ../fundamentals/labels/
---
# Understand labels

Labels are key value pairs and can be defined as anything! We like to refer to them as metadata to describe a log stream. If you are familiar with Prometheus, there are a few labels you are used to seeing like `job` and `instance`, and I will use those in the coming examples.
Labels are key-value pairs and can be defined as anything! We like to refer to them as metadata to describe a log stream. If you are familiar with Prometheus, there are a few labels you are used to seeing like `job` and `instance`, and I will use those in the coming examples.

The scrape configs we provide with Grafana Loki define these labels, too. If you are using Prometheus, having consistent labels between Loki and Prometheus is one of Loki's superpowers, making it incredibly [easy to correlate your application metrics with your log data](/blog/2019/05/06/how-loki-correlates-metrics-and-logs--and-saves-you-money/).

Expand Down Expand Up @@ -121,9 +121,9 @@ Now instead of a regex, we could do this:
{env="dev"} <- will return all logs with env=dev, in this case this includes both log streams
```

Hopefully now you are starting to see the power of labels. By using a single label, you can query many streams. By combining several different labels, you can create very flexible log queries.
Hopefully, now you are starting to see the power of labels. By using a single label, you can query many streams. By combining several different labels, you can create very flexible log queries.

Labels are the index to Loki log data. They are used to find the compressed log content, which is stored separately as chunks. Every unique combination of label and values defines a stream, and logs for a stream are batched up, compressed, and stored as chunks.
Labels are the index to Loki log data. They are used to find the compressed log content, which is stored separately as chunks. Every unique combination of labels and values defines a stream and logs for a stream are batched up, compressed, and stored as chunks.

For Loki to be efficient and cost-effective, we have to use labels responsibly. The next section will explore this in more detail.

Expand Down Expand Up @@ -152,7 +152,7 @@ The two previous examples use statically defined labels with a single value; how
__path__: /var/log/apache.log
```

This regex matches every component of the log line and extracts the value of each component into a capture group. Inside the pipeline code, this data is placed in a temporary data structure that allows using it for several purposes during the processing of that log line (at which point that temp data is discarded). Much more detail about this can be found in the [Promtail pipelines]({{< relref "../../send-data/promtail/pipelines" >}}) documentation.
This regex matches every component of the log line and extracts the value of each component into a capture group. Inside the pipeline code, this data is placed in a temporary data structure that allows use for several purposes during the processing of that log line (at which point that temp data is discarded). Much more detail about this can be found in the [Promtail pipelines]({{< relref "../../send-data/promtail/pipelines" >}}) documentation.

From that regex, we will be using two of the capture groups to dynamically set two labels based on content from the log line itself:

Expand Down Expand Up @@ -180,13 +180,13 @@ In Loki the following streams would be created:

Those four log lines would become four separate streams and start filling four separate chunks.

Any additional log lines that match those combinations of label/values would be added to the existing stream. If another unique combination of labels comes in (for example, `status_code="500"`) another new stream is created.
Any additional log lines that match those combinations of labels/values would be added to the existing stream. If another unique combination of labels comes in (for example, `status_code="500"`) another new stream is created.

Imagine now if you set a label for `ip`. Not only does every request from a user become a unique stream. Every request with a different action or status_code from the same user will get its own stream.

Doing some quick math, if there are maybe four common actions (GET, PUT, POST, DELETE) and maybe four common status codes (although there could be more than four!), this would be 16 streams and 16 separate chunks. Now multiply this by every user if we use a label for `ip`. You can quickly have thousands or tens of thousands of streams.

This is high cardinality, and it can lead to significant performance degredation.
This is high cardinality, and it can lead to significant performance degradation.

When we talk about _cardinality_ we are referring to the combination of labels and values and the number of streams they create. High cardinality is using labels with a large range of possible values, such as `ip`, **or** combining many labels, even if they have a small and finite set of values, such as using `status_code` and `action`.

Expand Down Expand Up @@ -216,7 +216,7 @@ To see how this works, let's look back at our example of querying your access lo

Behind the scenes, Loki will break up that query into smaller pieces (shards), and open up each chunk for the streams matched by the labels and start looking for this IP address.

The size of those shards and the amount of parallelization is configurable and based on the resources you provision. If you want to, you can configure the shard interval down to 5m, deploy 20 queriers, and process gigabytes of logs in seconds. Or you can go crazy and provision 200 queriers and process terabytes of logs!
The size of those shards and the amount of parallelization are configurable and based on the resources you provision. If you want to, you can configure the shard interval down to 5m, deploy 20 queriers, and process gigabytes of logs in seconds. Or you can go crazy and provision 200 queriers and process terabytes of logs!

This trade-off of smaller index and parallel brute force querying vs. a larger/faster full-text index is what allows Loki to save on costs versus other systems. The cost and complexity of operating a large index is high and is typically fixed -- you pay for it 24 hours a day if you are querying it or not.

Expand Down
Loading