Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Enable MLKEM768X25519 by default #2389

Open
wants to merge 21 commits into
base: main
Choose a base branch
from

Conversation

larseggert
Copy link
Collaborator

@larseggert larseggert commented Jan 24, 2025

This enables MLKEM768X25519 by default on the server and client. There is a parameter to turn it off.

This broke a ton of tests, since various handshake flights become longer than single packets. I fixed those tests where doing so was straightforward, and turned off MLKEM for those where it wasn't or where I wasn't sure the test was still doing the right thing.

There are some non-test changes:

  • neqo-transport/src/connection/mod.rs had some code in start_handshake that needed to move to postprocess_packet, because we can now be in WaitInitial for longer.
  • neqo-transport/src/crypto.rs has changes to the SNI slicing code in write_frame to make that be less wasteful (i.e., generate less padding) than the old code generated for longer-than-MTU crypto frames.
  • TxBuffers gained an is_empty function, so when retransmitting sliced CRYPTO frames, we don't stop (and pad the rest of the packet) if there are more discontiguous CRYPTO frames queued up for sending.
  • neqo-transport/src/connection/params.rs adds the mlkem option to ConnectionParameters

Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1943471

Copy link

github-actions bot commented Jan 24, 2025

Failed Interop Tests

QUIC Interop Runner, client vs. server, differences relative to 108fb8d.

neqo-latest as client

neqo-latest as server

All results

Succeeded Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

Unsupported Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

@larseggert
Copy link
Collaborator Author

Note to self: put MLKEM behind a config flag (that defaults to on). CC @jschanck

Copy link

codecov bot commented Jan 28, 2025

Codecov Report

Attention: Patch coverage is 94.23077% with 9 lines in your changes missing coverage. Please review.

Project coverage is 95.30%. Comparing base (1090de3) to head (234c7f0).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
neqo-transport/src/crypto.rs 89.88% 9 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2389      +/-   ##
==========================================
- Coverage   95.31%   95.30%   -0.01%     
==========================================
  Files         114      114              
  Lines       36869    36954      +85     
  Branches    36869    36954      +85     
==========================================
+ Hits        35142    35220      +78     
- Misses       1721     1728       +7     
  Partials        6        6              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@larseggert larseggert marked this pull request as ready for review January 28, 2025 15:08
Copy link

github-actions bot commented Jan 28, 2025

Benchmark results

Performance differences relative to 1090de3.

decode 4096 bytes, mask ff: No change in performance detected.
       time:   [11.851 µs 11.884 µs 11.924 µs]
       change: [-0.6905% +0.0091% +0.6244%] (p = 0.99 > 0.05)

Found 13 outliers among 100 measurements (13.00%)
2 (2.00%) low mild
2 (2.00%) high mild
9 (9.00%) high severe

decode 1048576 bytes, mask ff: Change within noise threshold.
       time:   [2.9217 ms 2.9309 ms 2.9417 ms]
       change: [+0.8098% +1.2620% +1.7254%] (p = 0.00 < 0.05)

Found 8 outliers among 100 measurements (8.00%)
8 (8.00%) high severe

decode 4096 bytes, mask 7f: No change in performance detected.
       time:   [19.793 µs 19.843 µs 19.901 µs]
       change: [-0.1053% +2.4502% +6.0057%] (p = 0.14 > 0.05)

Found 20 outliers among 100 measurements (20.00%)
2 (2.00%) low severe
2 (2.00%) low mild
1 (1.00%) high mild
15 (15.00%) high severe

decode 1048576 bytes, mask 7f: Change within noise threshold.
       time:   [5.0495 ms 5.0617 ms 5.0750 ms]
       change: [-0.9013% -0.5243% -0.1546%] (p = 0.01 < 0.05)

Found 16 outliers among 100 measurements (16.00%)
16 (16.00%) high severe

decode 4096 bytes, mask 3f: No change in performance detected.
       time:   [6.8901 µs 6.8938 µs 6.9012 µs]
       change: [-1.7885% -0.7947% +0.0617%] (p = 0.09 > 0.05)

Found 9 outliers among 100 measurements (9.00%)
6 (6.00%) low mild
3 (3.00%) high severe

decode 1048576 bytes, mask 3f: No change in performance detected.
       time:   [1.4154 ms 1.4222 ms 1.4293 ms]
       change: [-0.4874% +0.1987% +0.7945%] (p = 0.56 > 0.05)

Found 7 outliers among 100 measurements (7.00%)
7 (7.00%) high severe

coalesce_acked_from_zero 1+1 entries: No change in performance detected.
       time:   [99.019 ns 99.355 ns 99.698 ns]
       change: [-0.2396% +0.1567% +0.6178%] (p = 0.48 > 0.05)

Found 9 outliers among 100 measurements (9.00%)
1 (1.00%) high mild
8 (8.00%) high severe

coalesce_acked_from_zero 3+1 entries: No change in performance detected.
       time:   [117.39 ns 117.73 ns 118.11 ns]
       change: [-0.0742% +0.3418% +0.7544%] (p = 0.11 > 0.05)

Found 12 outliers among 100 measurements (12.00%)
1 (1.00%) low mild
1 (1.00%) high mild
10 (10.00%) high severe

coalesce_acked_from_zero 10+1 entries: Change within noise threshold.
       time:   [117.13 ns 117.62 ns 118.20 ns]
       change: [+0.3880% +0.7998% +1.2989%] (p = 0.00 < 0.05)

Found 12 outliers among 100 measurements (12.00%)
4 (4.00%) low mild
8 (8.00%) high severe

coalesce_acked_from_zero 1000+1 entries: No change in performance detected.
       time:   [97.534 ns 97.682 ns 97.844 ns]
       change: [-0.4304% +0.4917% +1.3941%] (p = 0.33 > 0.05)

Found 8 outliers among 100 measurements (8.00%)
2 (2.00%) high mild
6 (6.00%) high severe

RxStreamOrderer::inbound_frame(): Change within noise threshold.
       time:   [111.83 ms 111.88 ms 111.93 ms]
       change: [+0.4982% +0.5591% +0.6228%] (p = 0.00 < 0.05)

Found 16 outliers among 100 measurements (16.00%)
7 (7.00%) low mild
9 (9.00%) high mild

SentPackets::take_ranges: No change in performance detected.
       time:   [5.3888 µs 5.4676 µs 5.5437 µs]
       change: [-2.9725% -0.5511% +1.9954%] (p = 0.67 > 0.05)

Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) high mild
1 (1.00%) high severe

transfer/pacing-false/varying-seeds: Change within noise threshold.
       time:   [42.008 ms 42.102 ms 42.200 ms]
       change: [+1.8974% +2.2183% +2.5649%] (p = 0.00 < 0.05)

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe

transfer/pacing-true/varying-seeds: Change within noise threshold.
       time:   [42.093 ms 42.183 ms 42.285 ms]
       change: [+1.1567% +1.4529% +1.7753%] (p = 0.00 < 0.05)

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe

transfer/pacing-false/same-seed: Change within noise threshold.
       time:   [42.287 ms 42.373 ms 42.473 ms]
       change: [+1.8840% +2.1879% +2.4937%] (p = 0.00 < 0.05)

Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe

transfer/pacing-true/same-seed: Change within noise threshold.
       time:   [42.506 ms 42.591 ms 42.687 ms]
       change: [+1.3298% +1.6156% +1.9003%] (p = 0.00 < 0.05)

Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) low mild
1 (1.00%) high severe

1-conn/1-100mb-resp/mtu-1504 (aka. Download)/client: 💔 Performance has regressed.
       time:   [926.86 ms 935.21 ms 943.64 ms]
       thrpt:  [105.97 MiB/s 106.93 MiB/s 107.89 MiB/s]
change:
       time:   [+2.8820% +4.3313% +5.7442%] (p = 0.00 < 0.05)
       thrpt:  [-5.4321% -4.1515% -2.8013%]

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild

1-conn/10_000-parallel-1b-resp/mtu-1504 (aka. RPS)/client: Change within noise threshold.
       time:   [323.80 ms 327.28 ms 330.81 ms]
       thrpt:  [30.229 Kelem/s 30.555 Kelem/s 30.884 Kelem/s]
change:
       time:   [+0.7914% +2.1290% +3.5416%] (p = 0.00 < 0.05)
       thrpt:  [-3.4204% -2.0846% -0.7852%]
1-conn/1-1b-resp/mtu-1504 (aka. HPS)/client: 💚 Performance has improved.
       time:   [25.569 ms 25.748 ms 25.934 ms]
       thrpt:  [38.560  elem/s 38.839  elem/s 39.110  elem/s]
change:
       time:   [-25.713% -25.036% -24.415%] (p = 0.00 < 0.05)
       thrpt:  [+32.301% +33.398% +34.612%]

Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild

1-conn/1-100mb-resp/mtu-1504 (aka. Upload)/client: 💔 Performance has regressed.
       time:   [1.8980 s 1.9171 s 1.9366 s]
       thrpt:  [51.638 MiB/s 52.163 MiB/s 52.687 MiB/s]
change:
       time:   [+16.161% +18.001% +19.881%] (p = 0.00 < 0.05)
       thrpt:  [-16.584% -15.255% -13.913%]

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild

Client/server transfer results

Transfer of 33554432 bytes over loopback.

Client Server CC Pacing MTU Mean [ms] Min [ms] Max [ms]
gquiche gquiche 1504 561.1 ± 84.8 510.0 750.2
neqo gquiche reno on 1504 745.4 ± 20.8 723.3 796.1
neqo gquiche reno 1504 768.1 ± 58.0 704.4 915.3
neqo gquiche cubic on 1504 765.3 ± 51.0 733.1 896.9
neqo gquiche cubic 1504 768.9 ± 66.8 730.3 953.8
msquic msquic 1504 147.7 ± 79.8 93.8 351.7
neqo msquic reno on 1504 241.5 ± 74.3 196.7 463.1
neqo msquic reno 1504 238.5 ± 62.3 196.5 426.0
neqo msquic cubic on 1504 270.0 ± 77.6 211.0 444.6
neqo msquic cubic 1504 323.3 ± 130.6 204.6 572.1
gquiche neqo reno on 1504 675.8 ± 89.3 551.5 815.7
gquiche neqo reno 1504 694.1 ± 71.5 571.7 792.1
gquiche neqo cubic on 1504 724.1 ± 156.4 561.5 1062.5
gquiche neqo cubic 1504 684.7 ± 93.0 545.9 815.6
msquic neqo reno on 1504 500.0 ± 70.4 455.8 688.6
msquic neqo reno 1504 496.4 ± 93.1 427.4 696.9
msquic neqo cubic on 1504 524.6 ± 159.0 442.1 906.1
msquic neqo cubic 1504 479.3 ± 7.6 467.2 489.6
neqo neqo reno on 1504 470.5 ± 50.6 444.2 610.1
neqo neqo reno 1504 454.9 ± 51.2 428.0 594.9
neqo neqo cubic on 1504 471.4 ± 48.8 439.2 606.1
neqo neqo cubic 1504 485.4 ± 69.6 438.8 617.0

⬇️ Download logs

Copy link
Member

@martinthomson martinthomson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% happy with the complexity increase in the tests, but they do seem to be limited enough.

BTW, the changes to crypto frame sending are probably separable. No harm done and hard to extricate from the rest of this business, but it might have been better to tackle that part first.

There are some important changes that I suggest, but nothing too bad.

neqo-transport/src/connection/mod.rs Outdated Show resolved Hide resolved
neqo-transport/src/connection/mod.rs Outdated Show resolved Hide resolved
@@ -2503,6 +2515,14 @@ impl Connection {
// but wait until after sending an ACK.
self.discard_keys(PacketNumberSpace::Handshake, now);
}

// If the client has more Initial CRYPTO data queued up, do not coalesce.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you say a little more about why you needed to add this check? My understanding was that we'd have filled the buffer if this was the case. You could maybe replace this with the same if builder.is_full() check as above in that case.

Copy link
Collaborator Author

@larseggert larseggert Jan 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the issue here is that with SNI slicing and 0-RTT and without this change, the client sends a first Initial packet that contains the second part of the CRYPTO data, coalesced with a 0-RTT packet. The server doesn't seem to save that 0-RTT data for later processing (keys for ZeroRtt already discarded which is clearly wrong). Maybe that is the bug?

Copy link
Collaborator Author

@larseggert larseggert Jan 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes:

/// Whether keys for processing packets in the indicated space are pending.
/// This allows the caller to determine whether to save a packet for later
/// when keys are not available.
/// NOTE: 0-RTT keys are not considered here. The expectation is that a
/// server will have to save 0-RTT packets in a different place. Though it
/// is possible to attribute 0-RTT packets to an existing connection if there
/// is a multi-packet Initial, that is an unusual circumstance, so we
/// don't do caching for that in those places that call this function.
pub fn rx_pending(&self, space: CryptoSpace) -> bool {
match space {
CryptoSpace::Initial | CryptoSpace::ZeroRtt => false,
CryptoSpace::Handshake => self.handshake.is_none() && !self.initials.is_empty(),
CryptoSpace::ApplicationData => self.app_read.is_none(),
}
}

It seems we're not saving those 0-RTT packets anywhere currently?

neqo-transport/src/crypto.rs Outdated Show resolved Hide resolved
neqo-transport/src/crypto.rs Outdated Show resolved Hide resolved
neqo-transport/src/crypto.rs Outdated Show resolved Hide resolved
neqo-transport/src/crypto.rs Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants