Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(encoding): implement UTF-8 support in metric and label names #236

Open
wants to merge 20 commits into
base: master
Choose a base branch
from

Conversation

fedetorres93
Copy link

Adds UTF-8 support for metric and label names. Addresses #190.

These changes are based on the work done on the Prometheus common libraries prometheus/common#537 and prometheus/common#570

  • Encoders will use the new quoting syntax {"foo"} iff the metric does not conform to the legacy name format (foo{})

  • The Registry struct has two new fields: name_validation_scheme which determines if validation is done using the legacy or the UTF-8 scheme and escaping_scheme which determines the escaping scheme used by default when scrapers don't support UTF-8.

  • Scrapers can announce via content negotiation that they support UTF-8 names by adding escaping=allow-utf-8 in the Accept header. In cases where UTF-8 is not available, metric providers can be configured to escape names in a few different ways: values (U__ UTF value escaping for perfect round-tripping), underscores (all invalid chars become _), dots (dots become _dot_, _ becomes __, all other values become ___). Escaping can either be a global default (escaping_scheme, as mentioned above) or can also be specified in the Accept header with the escaping= term, which can be allow-utf-8 (for UTF-8-compatible), underscores, dots, or values. Existing functionality is maintained.

Work towards prometheus/prometheus#13095.

Signed-off-by: Federico Torres <[email protected]>
Signed-off-by: Federico Torres <[email protected]>
Signed-off-by: Federico Torres <[email protected]>
Signed-off-by: Federico Torres <[email protected]>
Signed-off-by: Federico Torres <[email protected]>
@fedetorres93 fedetorres93 marked this pull request as ready for review November 7, 2024 19:39
@fedetorres93
Copy link
Author

@mxinden Please take a look, thanks!

@fedetorres93
Copy link
Author

@mxinden I just wanted to check in regarding this PR. Did you have a chance to take an initial look?

Looking forward to any feedback you might have. Thanks!

@mxinden
Copy link
Member

mxinden commented Dec 13, 2024

Hi @fedetorres93,

I am sorry for the delay. I appreciate the time you invested making a solid patch!

As I assume many of us do, I am maintaining this crate in my free time. Skimming the proposal, I don't see it adding a lot of value and thus I don't see myself prioritizing this work over other patches on this repository.

Now me not investing time into this obviously doesn't mean other can't. Maybe you can find a Prometheus maintainer to champion this work and do extensive reviews.

On a high level, I would want this work to (a) not introduce significant complexity to the library and (b) no performance regressions to the hot paths (metric recording and metric encoding).

@ywwg
Copy link
Member

ywwg commented Dec 13, 2024

Hi @mxinden , thanks for your reply and continued maintenance of this library! With the release of Prometheus 3.0, one of our goals in adding support for UTF-8 everywhere in Prometheus includes updating all of the officially-supported client libraries, of which Rust is one. One issue we've had with getting these libraries updated is finding the necessary language experts such as yourself to certify that the changes we're making are safe and well-written. So far, you're the only person we've found who is both an expert in Rust and Prometheus.

As far as correctness goes, I can help do a review to make sure the test cases are covering all of the situations where quoting and escaping are required. For performance, I'm guessing there are ways of benchmarking Rust but I am not familiar with them. @fedetorres93 and I can work to try to generate those numbers. But for language correctness, we need a Rust expert to verify that we're adhering to Rust style.

I think splitting up the work this way will keep the burden on you as low as possible. Fede and I can go back and forth on the test coverage and performance and then call upon you again when we think we're in good shape. Does that sound workable?

@mxinden
Copy link
Member

mxinden commented Dec 14, 2024

Works for me. Thanks!

@ywwg
Copy link
Member

ywwg commented Dec 16, 2024

Thank you for your flexibility! @fedetorres93 let's start by really ramping up on unit tests, as I've worked on the Go common library I've had to add a bunch of edge cases

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants