-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Detect fields based on per-tenant configuration and put them into structured metadata at ingest time #15188
Conversation
d1c8c1b
to
da361dd
Compare
eventually I think we can deprecate the explicit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall lgtm, just one comment
@shantanualsi but @chaudum why put this in Loki rather than have this sort of thing be the responsibility of the agent? |
Hello, |
Yes, ideally this is done on the agent. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall looks good, a few small nits.
pkg/distributor/distributor.go
Outdated
@@ -454,8 +454,9 @@ func (d *Distributor) Push(ctx context.Context, req *logproto.PushRequest) (*log | |||
|
|||
now := time.Now() | |||
validationContext := d.validator.getValidationContextForTime(now, tenantID) | |||
levelDetector := newLevelDetector(validationContext) | |||
levelDetector := newFieldDetector(validationContext) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no longer doing the work of a levelDetecter
, let's rename this variable
}, | ||
}, | ||
{ | ||
name: "logline (json) matches", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about multiple matches (ie. org_id
and tenant_id
)? a test documenting that behavior would be nice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a good call! The behaviour was different between json
and logfmt
lines. json
matches the first field from the config, logfmt
matches the first matching field in the log line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed with 167f350
Considering the fact that this detection of fields happens on both push paths (Push, OTLP) how does this feature interact with the Note: Leaving the resource atributes to labels out here because the presented feature is only about structured metadata. |
I see it as an addition. The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from a code POV lgtm
There was a suggestion to slightly change the configuration schema to allow for possible future extension to extract all fields of a log line, e.g. limits_config:
discover_generic_fields:
auto:
format: json | logfmt | pattern
pattern: "<_> FOO:<foo> <_>"
fields:
trace_id:
- "trace_id"
- "TRACE_ID"
- "traceID"
- "TraceID"
org_id:
- "org_id"
- "tenant_id"
- "user_id" The |
ac0eb8b
to
01d71ce
Compare
💻 Deploy preview deleted. |
Signed-off-by: Christian Haudum <[email protected]>
Signed-off-by: Christian Haudum <[email protected]>
Signed-off-by: Christian Haudum <[email protected]>
Signed-off-by: Christian Haudum <[email protected]>
Signed-off-by: Christian Haudum <[email protected]>
Signed-off-by: Christian Haudum <[email protected]>
Signed-off-by: Christian Haudum <[email protected]>
Signed-off-by: Christian Haudum <[email protected]>
Signed-off-by: Christian Haudum <[email protected]>
ce21e0a
to
34a74b3
Compare
Signed-off-by: Christian Haudum <[email protected]>
Signed-off-by: Christian Haudum <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
What this PR does / why we need it:
This PR introduces a new feature that allows for extraction of "fields" into structured metadata at ingest time.
Fields can either be regular labels, structured metadata keys, or keys from
logfmt
orjson
formatted log lines.The fields are defined in a per-tenant configuration as
map[string][]string
, where the key is the target key of the structured metadata, and the value is the list of source fields in given order and the order given above.Example configuration:
Update: The fields moved into a "fields" objects:
While parsing of log lines comes with a certain penalty at ingest time (increased latency and CPU usage on distributors), the idea is to extract certain fields once to avoid parsing the log lines every single time at query time. This is mainly useful in combination with bloom filters.
Discussion
Should the value of the config map support jsonpath expression, such as
Where the log line looks like this:
✔️ jsonpath support has been implemented with d216856
Checklist
CONTRIBUTING.md
guide (required)feat
PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.docs/sources/setup/upgrade/_index.md
deprecated-config.yaml
anddeleted-config.yaml
files respectively in thetools/deprecated-config-checker
directory. Example PR