Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add C3ID implementation #21

Merged
merged 12 commits into from
Jan 15, 2025
1 change: 1 addition & 0 deletions src/Cask.sln
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "Cask", "Cask\Cask.csproj",
EndProject
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "Tests", "Tests", "{AA9664D7-21A5-4941-BE8A-D62765F58CE6}"
ProjectSection(SolutionItems) = preProject
Tests\.editorconfig = Tests\.editorconfig
Tests\Directory.Build.props = Tests\Directory.Build.props
Tests\Directory.Packages.props = Tests\Directory.Packages.props
EndProjectSection
Expand Down
93 changes: 93 additions & 0 deletions src/Cask/CrossCompanyCorrelatingId.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
// Copyright (c) Microsoft. All rights reserved.
// Licensed under the MIT license. See LICENSE file in the project root for full license information.

using System.Security.Cryptography;
using System.Text;

using static CommonAnnotatedSecurityKeys.Limits;

namespace CommonAnnotatedSecurityKeys;

/// <summary>
/// Cross-Company Correlating Id (C3ID) a 12-byte value used to correlate a
/// high-entropy keys with other data. The canonical textual representation is
/// base64 encoded and prefixed with "C3ID".
/// </summary>
public static class CrossCompanyCorrelatingId
{
/// <summary>
/// The size of a C3ID in raw bytes.
/// </summary>
public const int RawSizeInBytes = 12;

/// <summary>
/// The byte sequence prepended to the input for the first SHA256 hash. It
/// is defined as the UTF-8 encoding of "C3ID".
/// </summary>
private static ReadOnlySpan<byte> Prefix => "C3ID"u8;

/// <summary>
/// The byte sequence prepended to the to the output of the
/// base64-encoding. It is defined as the base64-decoding of "C3ID". This
/// results in all canonical base64 encoded C3IDs starting with "C3ID".
/// </summary>
private static ReadOnlySpan<byte> PrefixBase64Decoded => [0x0B, 0x72, 0x03];
nguerrera marked this conversation as resolved.
Show resolved Hide resolved

/// <summary>
/// Computes the C3ID for the given text in canonical textual form.
/// </summary>
public static string Compute(string text)
{
ThrowIfNullOrEmpty(text);
Span<byte> bytes = stackalloc byte[PrefixBase64Decoded.Length + RawSizeInBytes];
PrefixBase64Decoded.CopyTo(bytes);
ComputeRaw(text, bytes[PrefixBase64Decoded.Length..]);
return Convert.ToBase64String(bytes);
}

/// <summary>
/// Computes the raw C3ID bytes for the given text and writes them to the
/// destination span.
/// </summary>
public static void ComputeRaw(string text, Span<byte> destination)
{
ThrowIfNull(text);
ComputeRaw(text.AsSpan(), destination);
}

/// <summary>
/// Computes the raw C3ID bytes for the given UTF-16 encoded text sequence
/// and writes them to the destination span.
/// </summary>
public static void ComputeRaw(ReadOnlySpan<char> text, Span<byte> destination)
{
ThrowIfEmpty(text);
ThrowIfDestinationTooSmall(destination, RawSizeInBytes);
nguerrera marked this conversation as resolved.
Show resolved Hide resolved

int byteCount = Encoding.UTF8.GetByteCount(text);
Span<byte> textUtf8 = byteCount <= MaxStackAlloc ? stackalloc byte[byteCount] : new byte[byteCount];
Encoding.UTF8.GetBytes(text, textUtf8);
ComputeRawUtf8(textUtf8, destination);
}

/// <summary>
/// Computes the raw C3ID bytes for the given UTF-8 encoded text sequence
/// and writes them to the destination span.
/// </summary>>
public static void ComputeRawUtf8(ReadOnlySpan<byte> textUtf8, Span<byte> destination)
{
ThrowIfEmpty(textUtf8);
ThrowIfDestinationTooSmall(destination, RawSizeInBytes);

// Produce input for second hash: "C3ID"u8 + SHA256(text)
Span<byte> input = stackalloc byte[Prefix.Length + SHA256.HashSizeInBytes];
Prefix.CopyTo(input);
SHA256.HashData(textUtf8, input[Prefix.Length..]);

// Perform second hash, truncate, and copy to destination.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm getting concerned about the cost here. Two rounds of SHA256 is a lot of compute for a 12 byte hash.

For context from offline conversation: the rationale (as I understand it, but I may not) for two rounds in C3ID is to make it possible to compute the C3ID from the plain SHA256. Without sharing the key and without adding C3ID to their toolkit, someone can send you the SHA256 of a key, and then you can convert that to C3ID.

We should maybe reconsider if this scenario is worth the cost added to CASK scenarios. I'll provide some data on that cost in a minute.

Copy link
Contributor Author

@nguerrera nguerrera Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without C3ID in hash scenarios

From #18 (comment)

Method Mean Error StdDev Gen0 Allocated
CompareHash_Cask 451.609 ns 3.9292 ns 3.6753 ns - -
CompareHash_Floor 294.875 ns 2.7652 ns 2.5866 ns - -
GenerateHash_Cask 300.978 ns 1.3669 ns 1.2786 ns 0.0091 176 B
GenerateHash_Floor 225.569 ns 1.6476 ns 1.2864 ns 0.0057 112 B

With C3ID in hash scenarios

From my work-in-progress branch:

Method Mean Error StdDev Gen0 Allocated
CompareHash_Cask 856.6 ns 6.06 ns 5.37 ns - -
CompareHash_Floor 298.5 ns 3.69 ns 3.27 ns - -
GenerateHash_Cask 725.1 ns 5.20 ns 4.86 ns 0.0114 216 B
GenerateHash_Floor 226.3 ns 0.78 ns 0.69 ns 0.0057 112 B

There are some other changes in this branch so this doesn't quite isolate the C3ID cost. However, I also measured on this machine that a single SHA256 of Cask key sized data -- without allocation, copying, or encoding conversion -- takes about 120ns on this machine. That's a lot relative to the "floor" scenarios we're aiming to replace.

Up to now, I was confident in being able to tell a good story about how the extra Cask costs were a fair price to pay for Cask features, but I am not sure C3ID is living up to that now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With that said, I think it would be good to take this change and discuss options to improve this in follow-up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack on this and let's talk more.

Span<byte> sha = stackalloc byte[SHA256.HashSizeInBytes];
SHA256.HashData(input, sha);
sha[..RawSizeInBytes].CopyTo(destination);
}
}

13 changes: 13 additions & 0 deletions src/Cask/Helpers.cs
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,14 @@ public static void ThrowIfDestinationTooSmall<T>(Span<T> destination, int requir
}
}

public static void ThrowIfEmpty<T>(ReadOnlySpan<T> value, [CallerArgumentExpression(nameof(value))] string? paramName = null)
{
if (value.IsEmpty)
{
ThrowEmpty(paramName);
}
}

[DoesNotReturn]
private static void ThrowDefault(string? paramName)
{
Expand All @@ -123,4 +131,9 @@ private static void ThrowDestinationTooSmall(string? paramName)
throw new ArgumentException("Destination buffer is too small.", paramName);
}

[DoesNotReturn]
private static void ThrowEmpty(string? paramName)
{
throw new ArgumentException("Value cannot be empty.", paramName);
}
}
1 change: 1 addition & 0 deletions src/Cask/Polyfill.GlobalUsings.cs
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@

global using static Polyfill.ArgumentValidation;

global using Convert = Polyfill.Convert;
global using HMACSHA256 = Polyfill.HMACSHA256;
global using RandomNumberGenerator = Polyfill.RandomNumberGenerator;
global using SHA256 = Polyfill.SHA256;
51 changes: 51 additions & 0 deletions src/Cask/Polyfill.cs
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@
using System.Text;
using System.Text.RegularExpressions;

using Bcl_Convert = System.Convert;
using Bcl_HMACSHA256 = System.Security.Cryptography.HMACSHA256;
using Bcl_SHA256 = System.Security.Cryptography.SHA256;

Expand All @@ -69,6 +70,11 @@ internal static class Extensions
{
public static unsafe string GetString(this Encoding encoding, ReadOnlySpan<byte> bytes)
{
if (bytes.Length == 0)
{
return string.Empty;
}

fixed (byte* ptr = bytes)
{
return encoding.GetString(ptr, bytes.Length);
Expand All @@ -77,6 +83,11 @@ public static unsafe string GetString(this Encoding encoding, ReadOnlySpan<byte>

public static unsafe int GetByteCount(this Encoding encoding, ReadOnlySpan<char> chars)
{
if (chars.Length == 0)
{
return 0;
}

fixed (char* ptr = chars)
{
return encoding.GetByteCount(ptr, chars.Length);
Expand All @@ -85,6 +96,11 @@ public static unsafe int GetByteCount(this Encoding encoding, ReadOnlySpan<char>

public static unsafe int GetBytes(this Encoding encoding, ReadOnlySpan<char> chars, Span<byte> bytes)
{
if (chars.Length == 0)
{
return 0;
}

fixed (char* charPtr = chars)
fixed (byte* bytePtr = bytes)
{
Expand All @@ -103,6 +119,14 @@ public static void ThrowIfNull([NotNull] object? argument, [CallerArgumentExpres
}
}

public static void ThrowIfNullOrEmpty([NotNull] string? argument, [CallerArgumentExpression(nameof(argument))] string? paramName = null)
{
if (string.IsNullOrEmpty(argument))
{
ThrowNullOrEmpty(argument, paramName);
}
}

public static void ThrowIfGreaterThan(int value, int max, [CallerArgumentExpression(nameof(value))] string? paramName = null)
{
if (value > max)
Expand Down Expand Up @@ -136,6 +160,26 @@ private static void ThrowLessThan(int value, int min, string? paramName)
{
throw new ArgumentOutOfRangeException(paramName, value, $"Value must be greater than or equal to {min}.");
}

[DoesNotReturn]
private static void ThrowNullOrEmpty(string? argument, string? paramName)
{
ThrowIfNull(argument, paramName);
throw new ArgumentException("Value cannot be empty.", paramName);
}
}

internal static class Convert
{
public static string ToBase64String(ReadOnlySpan<byte> bytes)
{
return Bcl_Convert.ToBase64String(bytes.ToArray());
}

public static byte[] FromBase64String(string base64)
{
return Bcl_Convert.FromBase64String(base64);
}
}

internal static class RandomNumberGenerator
Expand Down Expand Up @@ -213,6 +257,13 @@ public static int HashData(ReadOnlySpan<byte> source, Span<byte> destination)
Hash.Compute(sha, source, destination);
return HashSizeInBytes;
}

public static byte[] HashData(ReadOnlySpan<byte> source)
{
byte[] hash = new byte[HashSizeInBytes];
HashData(source, hash);
return hash;
}
}
}

Expand Down
3 changes: 3 additions & 0 deletions src/Tests/.editorconfig
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,6 @@ dotnet_diagnostic.CA1707.severity = silent

# CA1822: Mark members as static
dotnet_diagnostic.CA1822.severity = silent

# CA1062: Validate arguments of public methods
dotnet_diagnostic.CA1062.severity = silent
121 changes: 121 additions & 0 deletions src/Tests/Cask.Tests/CrossCompanyCorrelatingIdTests.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
// Copyright (c) Microsoft. All rights reserved.
// Licensed under the MIT license. See LICENSE file in the project root for full license information.

using System.Security.Cryptography;
using System.Text;

using Xunit;

namespace CommonAnnotatedSecurityKeys.Tests;

public class CrossCompanyCorrelatingIdTests
{
[Theory]
[InlineData("Hello world", "C3IDnw4dY6uIibYownZw")]
[InlineData("😁", "C3IDF8FaWr4yMPcwOOxM")]
[InlineData("y_-KPF3BQb2-VHZeqrp28c6dgiL9y7H9TRJmQ5jJe9OvJQQJTESTBAU4AAB5mIhC", "C3IDKx9aukbRgOnPEyeu")]
[InlineData("Kq03wDtdCGWvs3sPgbH84H5MDADIJMZEERRhUN73CaGBJQQJTESTBAU4AADqe9ge", "C3IDO93RBPyuaA6ZRK8+")]
public void C3Id_Basic(string text, string expected)
{
string actual = ComputeC3Id(text);
Assert.Equal(expected, actual);
}

[Fact]
public void C3Id_LargeText()
{
string actual = ComputeC3Id(text: new string('x', 300));
Assert.Equal("C3IDs+pSKJ1FmRW+7EZk", actual);
}

[Fact]
public void C3Id_Null_Throws()
{
Assert.Throws<ArgumentNullException>("text", () => CrossCompanyCorrelatingId.Compute(null!));
}

[Fact]
public void C3Id_Empty_Throws()
{
Assert.Throws<ArgumentException>("text", () => CrossCompanyCorrelatingId.Compute(""));
}

[Fact]
public void C3Id_EmptyRaw_Throws()
{
byte[] destination = new byte[CrossCompanyCorrelatingId.RawSizeInBytes];
Assert.Throws<ArgumentException>("text", () => CrossCompanyCorrelatingId.ComputeRaw("", destination));
}

[Fact]
public void C3Id_EmptyRawSpan_Throws()
{
byte[] destination = new byte[CrossCompanyCorrelatingId.RawSizeInBytes];
Assert.Throws<ArgumentException>("text", () => CrossCompanyCorrelatingId.ComputeRaw([], destination));
}

[Fact]
public void C3Id_EmptyRawUtf8_Throws()
{
byte[] destination = new byte[CrossCompanyCorrelatingId.RawSizeInBytes];
Assert.Throws<ArgumentException>("textUtf8", () => CrossCompanyCorrelatingId.ComputeRawUtf8([], destination));
}

[Fact]
public void C3Id_DestinationTooSmall_Throws()
{
byte[] destination = new byte[CrossCompanyCorrelatingId.RawSizeInBytes - 1];
Assert.Throws<ArgumentException>(
"destination",
() => CrossCompanyCorrelatingId.ComputeRaw("test", destination));
}

[Fact]
public void C3Id_DestinationTooSmallUtf8_Throws()
{
byte[] destination = new byte[CrossCompanyCorrelatingId.RawSizeInBytes - 1];
Assert.Throws<ArgumentException>(
"destination",
() => CrossCompanyCorrelatingId.ComputeRawUtf8("test"u8, destination));
}

private static string ComputeC3Id(string text)
{
string reference = ReferenceCrossCompanyCorrelatingId.Compute(text);
string actual = CrossCompanyCorrelatingId.Compute(text);

Assert.True(
actual == reference,
$"""
Actual implementation did not match reference implementation for '{text}'.

reference: {reference}
actual: {actual}
""");

return actual;
}

/// <summary>
/// A trivial reference implementation of C3ID that is easy to understand,
/// but not optimized for performance. We compare this to the production
/// implementation to ensure that it remains equivalent to this.
/// </summary>
private static class ReferenceCrossCompanyCorrelatingId
{
public static string Compute(string text)
{
// Compute the SHA-256 hash of the UTF8-encoded text
Span<byte> hash = SHA256.HashData(Encoding.UTF8.GetBytes(text));

// Prefix the result with "C3ID" UTF-8 bytes and hash again
hash = SHA256.HashData([.. "C3ID"u8, .. hash]);
nguerrera marked this conversation as resolved.
Show resolved Hide resolved

// Truncate to 12 bytes
hash = hash[..12];

// Convert to base64 and prepend "C3ID"
return "C3ID" + Convert.ToBase64String(hash);
}
}
}
25 changes: 25 additions & 0 deletions src/Tests/Cask.Tests/PolyfillTests.cs
Original file line number Diff line number Diff line change
Expand Up @@ -271,6 +271,31 @@ public void Random_NotDeterministic()
Assert.False(random1.SequenceEqual(random2), "RandomNumberGenerator produced two identical 32-byte sequences.");
}

[Fact]
public void Encoding_GetString_Empty()
{
ReadOnlySpan<byte> data = [];
string text = Encoding.UTF8.GetString(data);
Assert.Equal("", text);
}

[Fact]
public void Encoding_GetByteCount_Empty()
{
ReadOnlySpan<char> text = "".AsSpan();
int byteCount = Encoding.UTF8.GetByteCount(text);
Assert.Equal(0, byteCount);
}

[Fact]
public void Encoding_GetBytes_Empty()
{
ReadOnlySpan<char> text = "".AsSpan();
Span<byte> bytes = [];
int bytesWritten = Encoding.UTF8.GetBytes(text, bytes);
Assert.Equal(0, bytesWritten);
}

#if NETFRAMEWORK // We don't need to stress test the modern BCL :)
[Fact]
public async Task Polyfill_ThreadingStress()
Expand Down
Loading