-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add C3ID implementation #21
base: main
Are you sure you want to change the base?
Conversation
This ports C3ID computation (that we will need soon) from https://github.com/microsoft/security-utilities Changes: - Company name is a parameter and not hard-coded to Microsoft - Optimized like the rest of Cask to avoid allocations and write the result to a span Tests added here were also run against the implementation in security-utilities to check equivalence. Also: fix edge cases with empty span input in encoding polyfills.
#22 should fix the build issue. Let's get that reviewed and merged first. |
|
||
namespace CommonAnnotatedSecurityKeys; | ||
|
||
internal static class CrossCompanyCorrelatingId |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a thought, but I tend to think that a nice summary element on every class type is a nice baseline requirement, even if projects don't take on the costs of authoring nicely rendered per-method/property comments as you've done here.
This class declaration would be a nice place to describe a c3id in general, to link to a reference doc, etc.
No action for you, I can take an action to fill this out later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've put a summary, but it will need more work. I will work on a proper specification and update the doc comments. I'll do that separately so that I can consume this in another branch, get out that PR, then work on the spec and docs in parallel with the review for that larger PR.
[InlineData("😁", "f/BTV0j6A8km4KDw7aJz")] | ||
public void Test_Basic(string text, string expected) | ||
{ | ||
string actual = ComputeC3IDBase64(company: "Microsoft", text); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey is the proper rendering C3ID
or C3id
?
Haha. It is a true fact that I treat this as a valid question despite the obvious triviality of the concern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I prefer C3id
but need I mention I have no desire to die on this hill? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tehe. I think I prefer C3Id
or C3ID
over C3id
. Id
is a word so if the convention in C#
is CreateUrl
, C3Id
would be the most consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not appear in any identifier other than in tests. As such, my care factor is vanishingly small. I'm just as picky about these things in public API, so no judgment. 😁 I'll happily change it, but I still don't which to use...
public class CrossCompanyCorrelatingIdTests | ||
{ | ||
[Theory] | ||
[InlineData("", "C3IDKZnmJO6my5hknQ5W")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this value could become very familiar to folks troubleshooting data issues. :) i think your idea about possibly putting some exceptions/data guards in place (for the text that we hash) could make. certainly failing a string.IsNullOrWhitespace
check would qualify.
the problem is that it can be unsatisfying to add an incomplete set of guards in circumstances. but this one may be worth adding!
} | ||
|
||
[Fact] | ||
public void C3ID_NullInput_Throws() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public static class CrossCompanyCorrelatingId | ||
{ | ||
/// <summary> | ||
/// The size of a Cross-Company Correlating ID (aka C3ID) in raw bytes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public const int RawSizeInBytes = 12; | ||
|
||
private static ReadOnlySpan<byte> Prefix => "C3ID"u8; | ||
private static ReadOnlySpan<byte> PrefixBase64Decoded => [0x0B, 0x72, 0x03]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// </summary> | ||
public static string Compute(string text) | ||
{ | ||
ThrowIfNull(text); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Computes the raw C3ID bytes for the given text and writes them to the | ||
/// destination span. | ||
/// </summary> | ||
public static void Compute(string text, Span<byte> destination) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You know, I've wondered, why do some API go ahead and return a span in some API and in others we provide an argument.
And you've answered it! If you would otherwise have an overload that would differ only by return type, you are stuck!
honestly, this is an (unavoidable as far as I can see) idiosyncrasy that's a bit regrettable.
but what can you do? the fact is that string is now effectively a replaceable entity and the only way to provide functionality that diverges is by unique parameterization.
another bit of weirdness is that in places where you can control the return value and use Span the ReadOnly version might be appropriate (and easy to forget to use??)
I'm going on about this too much, forgive me, it's fun to talk to you about this stuff, but I'm wondering about alternate API shapes, e.g.:
void Compute(string text, out string destination);
void Compute(string text, Span destination);
also, potentially zapping exception raising by simply providing these:
bool TryCompute(string text, out string destination);
bool TryCompute(string text, Span destination);
/// </summary> | ||
public static void Compute(ReadOnlySpan<char> text, Span<byte> destination) | ||
{ | ||
ThrowIfDestinationTooSmall(destination, RawSizeInBytes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, if we accept my idea around preventing empty or whitespace we'd want to add it here. So that the upstream doesn't duplicate the work.
I'm on the fence. On the one hand, it's perfectly legal to generate a hash for one or more tabs/spaces, etc. on the other hand, this data will never comprise something we should generate a c3id for. and certainly hashes of the empty string are a notorious source of data noise.
Your thoughts?
Span<byte> hash = SHA256.HashData(Encoding.UTF8.GetBytes(text)); | ||
|
||
// Prefix the result with "C3ID" UTF-8 bytes and hash again | ||
hash = SHA256.HashData([.. "C3ID"u8, .. hash]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see all tests are passing because your optimized version does the same thing. I thought we were going to use the decoded bytes directly not the UTF8 bytes of the encoded form.
This implements C3ID computation that we will need soon.
The implementation differs from the one in https://github.com/microsoft/security-utilities and this seeks to become the new standard.
Changes:
Tests include a naive, unoptimized implementation that is easier to understand, and we check that both implementations match. We also check that we produce the same deterministic value for a sampling of hard-coded test input.
This change also fixes edge cases with empty span input in encoding polyfills.