Skip to content

ci: Try and avoid large pass-by-value instances #2623

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

larseggert
Copy link
Collaborator

@larseggert larseggert commented May 12, 2025

This affects

  • ecn::Count
  • SocketAddr
  • Http3StreamInfo
  • QpackSettings
  • IpAddr

Copy link

github-actions bot commented May 12, 2025

Failed Interop Tests

QUIC Interop Runner, client vs. server, differences relative to 070ff3a.

neqo-latest as client

neqo-latest as server

  • aioquic vs. neqo-latest: run cancelled after 20 min
  • chrome vs. neqo-latest: ⚠️3
  • go-x-net vs. neqo-latest: H 🚀DC 6 ⚠️B L2 C2 CM
  • kwik vs. neqo-latest: run cancelled after 20 min
  • linuxquic vs. neqo-latest: run cancelled after 20 min
  • lsquic vs. neqo-latest: run cancelled after 20 min
  • msquic vs. neqo-latest: run cancelled after 20 min
  • mvfst vs. neqo-latest: Z B 🚀A L1 C1 ⚠️C2 CM
  • neqo vs. neqo-latest: ⚠️DC 3 B U A
  • ngtcp2 vs. neqo-latest: ⚠️LR Z 3 B 6
  • openssl vs. neqo-latest: ⚠️DC LR M 🚀3 B ⚠️S A 🚀L2 C2 BA ⚠️BP CM
  • picoquic vs. neqo-latest: run cancelled after 20 min
  • quic-go vs. neqo-latest: ⚠️LR R C1 C2 BP BA CM
  • quiche vs. neqo-latest: run cancelled after 20 min
  • quinn vs. neqo-latest: run cancelled after 20 min
  • s2n-quic vs. neqo-latest: run cancelled after 20 min
  • tquic vs. neqo-latest: run cancelled after 20 min
  • xquic vs. neqo-latest: run cancelled after 20 min
All results

Succeeded Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

Unsupported Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

@larseggert larseggert force-pushed the fix-avoid-large-pass-by-value branch from a6d27ec to f6f50a3 Compare May 12, 2025 08:17
@martinthomson
Copy link
Member

It's no longer SocketAddr.

@larseggert larseggert force-pushed the fix-avoid-large-pass-by-value branch 2 times, most recently from af6591d to c9db8d0 Compare May 12, 2025 11:01
This affects
- `ecn::Count`
- `SocketAddr`
- `Http3StreamInfo`
- `QpackSettings`
- `IpAddr`
@larseggert larseggert force-pushed the fix-avoid-large-pass-by-value branch from c9db8d0 to a6702db Compare May 12, 2025 11:02
@larseggert larseggert marked this pull request as ready for review May 12, 2025 11:15
Copy link
Member

@martinthomson martinthomson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this conversion automated? Because there are a few patterns that I'm seeing that are concerning. I stopped half way, but hopefully the patterns are clear.

There's some real improvement possible here, but those problems are enough that this isn't a good idea as-is.

larseggert added a commit to larseggert/neqo that referenced this pull request May 12, 2025
@@ -7,3 +7,5 @@ disallowed-macros = [
allow-mixed-uninlined-format-args = false
allow-unwrap-in-tests = true
allow-dbg-in-tests = true
pass-by-value-size-limit = 16
# avoid-breaking-exported-api = false # TODO: Enable this in a follow-on PR.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@larseggert
Copy link
Collaborator Author

@martinthomson maybe I'm misunderstanding something about Rust and this lint. I thought avoiding pass-by-value inherently meant to pass a reference, even if the called function needs a value. If the called function needs a value, it dereferences, which does not involve a stack allocation (but still involved a memory copy).

Copy link

github-actions bot commented May 12, 2025

Benchmark results

Performance differences relative to e179d56.

1-conn/1-100mb-resp/mtu-1504 (aka. Download)/client: 💚 Performance has improved.
       time:   [763.68 ms 793.32 ms 824.37 ms]
       thrpt:  [121.31 MiB/s 126.05 MiB/s 130.94 MiB/s]
change:
       time:   [−11.904% −7.0725% −1.5637%] (p = 0.01 < 0.05)
       thrpt:  [+1.5886% +7.6108% +13.512%]
1-conn/10_000-parallel-1b-resp/mtu-1504 (aka. RPS)/client: No change in performance detected.
       time:   [315.04 ms 316.41 ms 317.79 ms]
       thrpt:  [31.468 Kelem/s 31.605 Kelem/s 31.742 Kelem/s]
change:
       time:   [−0.7238% −0.1910% +0.3621%] (p = 0.49 > 0.05)
       thrpt:  [−0.3608% +0.1914% +0.7290%]

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild

1-conn/1-1b-resp/mtu-1504 (aka. HPS)/client: No change in performance detected.
       time:   [34.442 ms 35.673 ms 36.912 ms]
       thrpt:  [27.092  elem/s 28.032  elem/s 29.035  elem/s]
change:
       time:   [−4.3758% +0.3371% +5.6276%] (p = 0.89 > 0.05)
       thrpt:  [−5.3277% −0.3360% +4.5760%]
1-conn/1-100mb-req/mtu-1504 (aka. Upload)/client: No change in performance detected.
       time:   [7.2562 s 7.2700 s 7.2837 s]
       thrpt:  [13.729 MiB/s 13.755 MiB/s 13.781 MiB/s]
change:
       time:   [−0.4403% −0.1777% +0.0709%] (p = 0.18 > 0.05)
       thrpt:  [−0.0708% +0.1780% +0.4423%]
decode 4096 bytes, mask ff: No change in performance detected.
       time:   [11.844 µs 11.853 µs 11.861 µs]
       change: [−0.2345% −0.0302% +0.1849%] (p = 0.79 > 0.05)

Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high severe

decode 1048576 bytes, mask ff: No change in performance detected.
       time:   [3.0309 ms 3.0332 ms 3.0356 ms]
       change: [−0.0449% +0.0856% +0.2265%] (p = 0.20 > 0.05)

Found 4 outliers among 100 measurements (4.00%)
4 (4.00%) high mild

decode 4096 bytes, mask 7f: No change in performance detected.
       time:   [19.999 µs 20.011 µs 20.022 µs]
       change: [−0.0843% +0.0394% +0.1550%] (p = 0.53 > 0.05)

Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe

decode 1048576 bytes, mask 7f: No change in performance detected.
       time:   [5.1216 ms 5.1240 ms 5.1262 ms]
       change: [−0.3315% −0.0764% +0.1253%] (p = 0.55 > 0.05)

Found 8 outliers among 100 measurements (8.00%)
7 (7.00%) low severe
1 (1.00%) high mild

decode 4096 bytes, mask 3f: No change in performance detected.
       time:   [8.2885 µs 8.2970 µs 8.3060 µs]
       change: [−0.1941% +0.0081% +0.2282%] (p = 0.95 > 0.05)

Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe

decode 1048576 bytes, mask 3f: No change in performance detected.
       time:   [2.1228 ms 2.1247 ms 2.1265 ms]
       change: [−2.2534% −0.0754% +2.0491%] (p = 0.94 > 0.05)

Found 10 outliers among 100 measurements (10.00%)
10 (10.00%) low severe

1000 streams of 1 bytes/multistream: 💚 Performance has improved.
       time:   [15.570 ns 15.613 ns 15.658 ns]
       change: [−3.4906% −3.0740% −2.6564%] (p = 0.00 < 0.05)

Found 6 outliers among 500 measurements (1.20%)
1 (0.20%) low mild
4 (0.80%) high mild
1 (0.20%) high severe

1000 streams of 1000 bytes/multistream: 💚 Performance has improved.
       time:   [15.709 ns 15.750 ns 15.792 ns]
       change: [−3.0029% −2.5655% −2.1580%] (p = 0.00 < 0.05)

Found 6 outliers among 500 measurements (1.20%)
4 (0.80%) low mild
2 (0.40%) high mild

coalesce_acked_from_zero 1+1 entries: Change within noise threshold.
       time:   [94.443 ns 94.528 ns 94.600 ns]
       change: [−1.4413% −0.8275% −0.2154%] (p = 0.01 < 0.05)

Found 10 outliers among 100 measurements (10.00%)
6 (6.00%) low severe
4 (4.00%) low mild

coalesce_acked_from_zero 3+1 entries: No change in performance detected.
       time:   [112.41 ns 112.53 ns 112.63 ns]
       change: [−0.6507% −0.0843% +0.4502%] (p = 0.76 > 0.05)

Found 7 outliers among 100 measurements (7.00%)
7 (7.00%) low mild

coalesce_acked_from_zero 10+1 entries: No change in performance detected.
       time:   [111.51 ns 111.70 ns 111.86 ns]
       change: [−0.5427% +0.0296% +0.5935%] (p = 0.93 > 0.05)

Found 6 outliers among 100 measurements (6.00%)
6 (6.00%) low mild

coalesce_acked_from_zero 1000+1 entries: No change in performance detected.
       time:   [89.765 ns 90.202 ns 91.090 ns]
       change: [−0.2369% +0.4611% +1.0893%] (p = 0.18 > 0.05)

Found 8 outliers among 100 measurements (8.00%)
6 (6.00%) low mild
1 (1.00%) high mild
1 (1.00%) high severe

RxStreamOrderer::inbound_frame(): Change within noise threshold.
       time:   [110.61 ms 110.63 ms 110.65 ms]
       change: [+0.2584% +0.2805% +0.3019%] (p = 0.00 < 0.05)

Found 5 outliers among 100 measurements (5.00%)
4 (4.00%) low mild
1 (1.00%) high mild

SentPackets::take_ranges: No change in performance detected.
       time:   [9.6844 µs 9.7295 µs 9.7718 µs]
       change: [−2.7648% +0.0309% +2.8024%] (p = 0.99 > 0.05)

Found 5 outliers among 100 measurements (5.00%)
4 (4.00%) low severe
1 (1.00%) low mild

transfer/pacing-false/varying-seeds: Change within noise threshold.
       time:   [34.305 ms 34.331 ms 34.357 ms]
       change: [+2.0471% +2.1630% +2.2723%] (p = 0.00 < 0.05)
transfer/pacing-true/varying-seeds: Change within noise threshold.
       time:   [35.389 ms 35.439 ms 35.489 ms]
       change: [+1.5690% +1.7672% +1.9788%] (p = 0.00 < 0.05)
transfer/pacing-false/same-seed: Change within noise threshold.
       time:   [34.308 ms 34.330 ms 34.352 ms]
       change: [+1.7672% +1.8677% +1.9653%] (p = 0.00 < 0.05)
transfer/pacing-true/same-seed: Change within noise threshold.
       time:   [36.160 ms 36.189 ms 36.218 ms]
       change: [+2.1170% +2.2402% +2.3655%] (p = 0.00 < 0.05)

Client/server transfer results

Performance differences relative to e179d56.

Transfer of 33554432 bytes over loopback, min. 100 runs. All unit-less numbers are in milliseconds.

Client vs. server (params) Mean ± σ Min Max MiB/s ± σ Δ main Δ main
google vs. google 517.9 ± 23.8 497.1 610.7 61.8 ± 1.3 -7.2 -1.4%
google vs. neqo (cubic, paced) 382.1 ± 39.3 357.2 644.7 83.8 ± 0.8 6.1 1.6%
msquic vs. msquic 170.6 ± 43.0 137.5 389.4 187.6 ± 0.7 7.6 4.7%
msquic vs. neqo (cubic, paced) 285.0 ± 42.7 248.4 566.3 112.3 ± 0.7 0.8 0.3%
neqo vs. google (cubic, paced) 813.1 ± 27.5 787.2 1062.8 39.4 ± 1.2 -3.8 -0.5%
neqo vs. msquic (cubic, paced) 213.4 ± 32.6 192.8 397.4 150.0 ± 1.0 3.5 1.7%
neqo vs. neqo (cubic) 251.5 ± 18.0 220.2 353.3 127.3 ± 1.8 💔 6.4 2.6%
neqo vs. neqo (cubic, paced) 256.6 ± 47.6 227.7 616.5 124.7 ± 0.7 3.6 1.4%
neqo vs. neqo (reno) 243.0 ± 13.3 232.3 329.7 131.7 ± 2.4 -6.3 -2.5%
neqo vs. neqo (reno, paced) 249.9 ± 30.1 230.3 442.8 128.0 ± 1.1 -0.6 -0.2%
neqo vs. quiche (cubic, paced) 247.0 ± 30.7 228.2 457.0 129.5 ± 1.0 -9.3 -3.6%
neqo vs. s2n (cubic, paced) 264.0 ± 36.3 243.4 493.9 121.2 ± 0.9 1.3 0.5%
quiche vs. neqo (cubic, paced) 418.9 ± 42.2 361.9 658.3 76.4 ± 0.8 💚 -19.6 -4.5%
quiche vs. quiche 199.9 ± 20.3 176.8 305.6 160.1 ± 1.6 -11.0 -5.2%
s2n vs. neqo (cubic, paced) 311.4 ± 36.5 286.8 512.9 102.8 ± 0.9 8.3 2.7%
s2n vs. s2n 259.4 ± 31.5 236.1 397.9 123.4 ± 1.0 2.2 0.9%

⬇️ Download logs

@mxinden
Copy link
Member

mxinden commented May 12, 2025

Rust has the convention that the function signature should express what the function implementation needs. Example:

  • The function needs an allocated Vec, request an owned Vec, not a &Vec to be cloned later.
  • The function needs a u8, request an owned u8, not a &u8 that is then dereferenced.

In the context of this pull request, I suggest only changing to a reference if the function only needs a reference.

@larseggert
Copy link
Collaborator Author

Ok, I will give this a do-over tomorrow.

@martinthomson
Copy link
Member

Max got it. It is my understanding that this moves things as much as that is possible:

fn do_something(huge: ItsEnormous) -> ItsGargantuan {
  ItsGargantuan { huge, ...Default::default() }
}

That's not a firm guarantee, but the compiler can be somewhat smart about these things by using moves for these sorts of things.

@larseggert larseggert requested a review from martinthomson May 13, 2025 10:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants