Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] wrong permutation indices when task scheduling reordering occurs #256

Closed
akoshelev opened this issue Nov 20, 2022 · 11 comments
Closed
Labels
bug Something isn't working

Comments

@akoshelev
Copy link
Collaborator

Running concurrency test on sort revealed that we have a reordering issue somewhere.

    shuttle::check_random(
        || {
            shuttle::future::block_on(execute_sort()).unwrap();
        },
        5,
    );

fails with

test panicked in task 'main-thread'
index out of bounds: the len is 50 but the index is 1942298774
thread 'tests::randomized::sort' panicked at 'index out of bounds: the len is 50 but the index is 1942298774', src/protocol/sort/apply.rs:23:24
stack backtrace:
   0: rust_begin_unwind
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panicking.rs:584:5
   1: core::panicking::panic_fmt
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/panicking.rs:142:14
   2: core::panicking::panic_bounds_check
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/panicking.rs:84:5
   3: core::slice::<impl [T]>::swap
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/slice/mod.rs:619:36
   4: raw_ipa::protocol::sort::apply::apply
             at ./src/protocol/sort/apply.rs:23:17
   5: raw_ipa::protocol::sort::compose::compose::{{closure}}
             at ./src/protocol/sort/compose.rs:50:5
   6: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/future/mod.rs:91:19
   7: raw_ipa::protocol::sort::generate_sort_permutation::generate_sort_permutation::{{closure}}
             at ./src/protocol/sort/generate_sort_permutation.rs:69:9
   8: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/future/mod.rs:91:19
   9: <F as futures_core::future::TryFuture>::try_poll
             at /Users/koshelev/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-core-0.3.25/src/future.rs:82:9
  10: <futures_util::future::try_future::into_future::IntoFuture<Fut> as core::future::future::Future>::poll
             at /Users/koshelev/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-util-0.3.25/src/future/try_future/into_future.rs:34:9
  11: <F as futures_core::future::TryFuture>::try_poll
             at /Users/koshelev/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-core-0.3.25/src/future.rs:82:9
  12: <futures_util::future::try_maybe_done::TryMaybeDone<Fut> as core::future::future::Future>::poll
             at /Users/koshelev/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-util-0.3.25/src/future/try_maybe_done.rs:79:57
  13: <F as futures_core::future::TryFuture>::try_poll
             at /Users/koshelev/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-core-0.3.25/src/future.rs:82:9
  14: <futures_util::future::try_join_all::TryJoinAll<F> as core::future::future::Future>::poll
             at /Users/koshelev/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-util-0.3.25/src/future/try_join_all.rs:148:27
  15: raw_ipa::test_fixture::sort::execute_sort::{{closure}}
             at ./src/test_fixture/sort.rs:53:9
  16: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/future/mod.rs:91:19
  17: shuttle::future::block_on
             at /Users/koshelev/.cargo/registry/src/github.com-1ecc6299db9ec823/shuttle-0.4.1/src/future/mod.rs:148:15
  18: raw_ipa::tests::randomized::sort::{{closure}}
             at ./src/tests/randomized.rs:10:13

Permutation slice contains some weird values

permutation = [
    29,
    1,
    7,
    37,
    434506957,
    16,
    14,
    44,
    776628359,
    41,
    13,
    31,
    8,
    23,
    42,
    35,
    27,
    40,
    5,
    12,
    36,
    19,
    25,
    49,
    17,
    9,
    33,
    39,
    15,
    1942298774,
    34,
    38,
    4,
    21,
    43,
    0,
    28,
    2,
    20,
    32,
    3,
    1141533336,
    48,
    26,
    6,
    45,
    11,
    22,
    30,
    10,
]
@akoshelev akoshelev added the bug Something isn't working label Nov 20, 2022
@akoshelev
Copy link
Collaborator Author

akoshelev commented Nov 22, 2022

Reveal in isolation works fine

 cargo test --features shuttle --package raw-ipa --lib tests::randomized::reveal
   Compiling raw-ipa v0.1.0 (/Users/koshelev/workspace/raw-ipa)
    Finished test [unoptimized + debuginfo] target(s) in 5.34s
     Running unittests src/lib.rs (target/debug/deps/raw_ipa-13f7cbe4f34a1d27)

running 1 test
test tests::randomized::reveal ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 104 filtered out; finished in 6.70s

this is the test

  async fn reveal() -> Result<(), Error> {
        let mut rng = rand::thread_rng();
        let world: TestWorld = make_world(QueryId);
        let ctx = make_contexts::<Fp32BitPrime>(&world);

        for i in 0..10_u32 {
            let secret = rng.gen::<u128>();
            let input = Fp32BitPrime::from(secret);
            let share = share(input, &mut rng);
            let record_id = RecordId::from(i);
            let results = try_join_all(vec![
                ctx[0].clone().reveal(record_id, &share[0]),
                ctx[1].clone().reveal(record_id, &share[1]),
                ctx[2].clone().reveal(record_id, &share[2]),
            ])
                .await?;

            assert_eq!(input, results[0]);
            assert_eq!(input, results[1]);
            assert_eq!(input, results[2]);
        }

        Ok(())
    }

(edit: used 32bit prime field)

@akoshelev
Copy link
Collaborator Author

Infra stays strong and denies my allegations

 cargo test --features shuttle --package raw-ipa --lib tests::randomized::arithmetic
   Compiling raw-ipa v0.1.0 (/Users/koshelev/workspace/raw-ipa)
    Finished test [unoptimized + debuginfo] target(s) in 8.44s
     Running unittests src/lib.rs (target/debug/deps/raw_ipa-13f7cbe4f34a1d27)

running 1 test
test tests::randomized::arithmetic ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 105 filtered out; finished in 16.86s
#[test]
fn arithmetic() {
    shuttle::check_random(|| {
        shuttle::future::block_on(circuit::arithmetic::<Fp32BitPrime>(100, 5))
    }, 100);
}

@akoshelev
Copy link
Collaborator Author

Shuffle fails but with a different error

cargo test --color=always --features shuttle --package raw-ipa --lib tests::randomized::shuffle
    Finished test [unoptimized + debuginfo] target(s) in 1.34s
     Running unittests src/lib.rs (target/debug/deps/raw_ipa-13f7cbe4f34a1d27)

running 1 test
2022-11-22T21:12:19.587818Z ERROR generator::gen_impl: set panic inside generator
test tests::randomized::shuffle ... FAILED

failures:

---- tests::randomized::shuffle stdout ----
test panicked in task 'main-thread'
failing schedule: "9103a204ebf9feaf8d81c2f337000204008000600a000cca46c44a6486c84c0082c480606826ac60ac228c60680c828c2a66824224280a406624c422aa4c64acc246ca426cc060c6a4c04046c4000a6aa40cc0a404c064826a06406844422a408c2a2a4628660684442008286c424620c06688662c0266c066c26882682088c0822802c0082028428a240a8082220420a4a8a4a82400a4400a2242a4a004482244a2048c204a44422460028a06400a664c8c04c66a04c2224662060c62a488c6026c604a266c28440a2a08422a686048228a2008848a22246404a0ca4c28c60880a80a8a2a8042c668844c228686a00444a6484c800c2226cc664c026c26046cc60c6c060664a4cca660c0006c6c404acc0a60cc64a6a6660404464a060a"
pass that string to `shuttle::replay` to replay the failure
thread 'tests::randomized::shuffle' panicked at 'assertion failed: `(left == right)`
  left: `19_mod31`,
 right: `0_mod31`', src/test_fixture/sharing.rs:52:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    tests::randomized::shuffle

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 106 filtered out; finished in 0.02s

I want to look at apply_inv first

@akoshelev
Copy link
Collaborator Author

I don't know why I started looking at secure_apply_inv - it uses shuffle internally. Obviously it fails

test panicked in task 'main-thread'
failing schedule: "91038e14b4c9b7f7f6d3a6fed001000260000408000a000cc820a680c424a0a2424cc04460c666ca480424800624c8a6086644c06ac4626c26880204a42c84286c64a60244c86aacc6240286a6026c8204084866ca060a42ca6204668c60c6602046a4ca0642284686602a286a80c44424608a646ac246c8ca064a44aa4c2c68202a282444a6c6a86c0044c60a22acc8284a2a26a282822604662a864c20a448864a6a04aac68448004c4a4a26a0a02a64ac8a0046ca8c66a48a04a0004cc4064a2222860220880242a64c00a2604ca2c0462a4884a4a246088624c0c24a4060a28286660a204642a42668646aa824c2204284c62422c82004aa2406a0a28842662c400ca640a4c2c842000a20484a6aa44064c68c6a80ca622068404682c644220266022668a8c000c4640ca2228a644824066a6a0cc800266842c822c400286066ac86ca6c062280c262c2408c28a422ca0246a6064084a020066a88402c82c44a62ca042c4a82664c2c464626048ac62a6c0640aac0406044086aa446ca04a2cca82648a044c6644a46222446cc46664ca8a2a462c8682406208c2a0608206ca84442ac406646880202cc6a886ca6268802404a466c822ca640cca6ca2c886420024aca268c008a4440c8482a028668c4200a200c64ca882646a2a6282686844ca68022c0a4a6842442caa2066c28844a84824a6862000a26c04c4a8424a6026a88466a82c02c00a446cc244046a020a604c42c4600626c4a826a8cc020c800466268c4aa04cc62ccaa2626c8626ca4a02a2c64660862c2448446aa44a80ac42420cc000a4404a2a0268a2042a886c2a8c462262666226cc6288800a2ac64606c2a2606804a2c444a42802028a4604aa4822628482c080c8846cca40a2828a4642448844a44a2224822aaa060086046cc6262a6006cac88c26426a2446a8284a00242048c282880282422488aac426660c6c6a2a8c08c268a224406c62ca404c4260ca80806c848640262686a8c46c280a0a626c8c48286626c6620c00cc644aac0622c4c2a082a28660c6c802286cc62428084440626ca644cc04c4a0a04822a0a202648a66a4cc8204022824224aac2884c6a266c4aa22080c0682c2c4668846a2cac40a408886ac26244a620c6028a84c02aca60c0208cc6a026c482ca20ac40648ac628a2c8a42644220404a6a2864cca860ac40022ca08c0c2822606ac2c0a06a28a8cc02088cc02ac00aa28ccc0a200ac00a2ca40cac2a822200804080c028acca2c0a2a440aacc08802c0ccaac0282aaa42ac4a088a20024424cc80880ac80aa2ca044a022caa02848c42a0820480a84aa8c20208248c024828c24aa2262408046068a204caa864246a6c620a4842a8cc2842aa6408c680082048ca0626006a662c8626264a2800c8ca0422a666cc4aac828202022448800c6a2a6288c6466004c66a2248aac2448aa86240248ac4a0862226c002c08666c48804c2828a8800a080a484004c68820a444600c0a6a80c4482ca2a8064480a6280ac4c0886a26a64602220622a46c680a4c04aa6c08468046a62a284c028022266c42a0ca082068600284624c64aac4a2666ac0426200228c4066a8cac6c6428c02cc4820aa062282cc8a00ac2628c4464660cc8a4a6020c4ca64aa42884266a408cc80caa8066ca4a426622c406c688082c08ca0a04004ac2442648a4c4a2084208824c64ac0aa20662aa248202880c0404a2c80a800244462820240c42a64a00248c24ca62a4460264c066028ca0ca8a2684c80c6aaa0042888a42a22a0800442228a48a24a4220a4044880a080a08008080808800008"
pass that string to `shuttle::replay` to replay the failure


Left:  5_mod31
Right: 13_mod31

akoshelev added a commit to akoshelev/raw-ipa that referenced this issue Nov 23, 2022
While working on private-attribution#256, I noticed that reshare tests consistently fail concurrency test. It was caused by always using RecordId(0) in a loop that terminates if reshare successfully generates unique shares. When loop needs more than 1 iteration it fails on PRSS uniqueness check.

While it was possible to fix that test, I did not really like the probabilistic nature of it and decided to create 2 tests instead. One validates that reshare protocol actually communicates and another one that it is correct.

It was also a good opportunity to use new test API implemented by @martinthomson in private-attribution#261. It should be used everywhere.

One small change I made to the `TestWorld` to improve ergonomics changes `make_contexts` function to narrow generated contexts to some unique (within this world) step, so the following syntax is made possible

```rust
  for _ in 0..10 {
     world.semi_honest(..., |ctx, v| { ... })
  }
@akoshelev
Copy link
Collaborator Author

I found my suspect but still don't have the root cause. Running two futures concurrently causes data corruption.

        let ctx_bit = ctx.narrow(&Sort(bit_num));
        let ((shuffled_compose_permutation, random_permutations_for_shuffle), bit_i) = try_join(
            shuffle_and_reveal_permutation(
                ctx_bit.narrow(&ShuffleRevealPermutation),
                input_len,
                composed_less_significant_bits_permutation,
            ),
            convert_shares_for_a_bit(ctx_bit.narrow(&ModulusConversion), input, num_bits, bit_num),
        )
        .await?;

if I remove try_join everything works fine.

I also wrote a simpler test that drives shuffle and convert in parallel and it shows the same symptom. I have a chance to validate shares produced by shuffle and they are corrupted

    x1           x2           x2        x3           x3            x1
(1950749719, 3688215208), (367963562, 1976254010), (1976254010, 1950749719)

clearly helper 1's view of X2 does not match helper 2's view.

@akoshelev
Copy link
Collaborator Author

got some more samples of data corruption. Interestingly they all have x2 share corrupted

 (2844853426_mod4294967291, 3747931916_mod4294967291), (471208790_mod4294967291, 978905089_mod4294967291), 
 (978905089_mod4294967291, 2844853426_mod4294967291)
(2051700797_mod4294967291, 2302271972_mod4294967291), (1619903499_mod4294967291, 623362995_mod4294967291), (623362995_mod4294967291, 2051700797_mod4294967291)

@akoshelev
Copy link
Collaborator Author

Narrowed down the issue to reshare. Here is the test that reproduces it (forgive the copy paste and syntax)

#[test]
fn reshare_just_reshare() {
    shuttle::check_random(
        || {
            shuttle::future::block_on(reshare_just_reshare_inner());
        },
        100,
    );
}

async fn reshare_just_reshare_inner() {
    let world = make_world(QueryId);
    const NUM_BITS: u8 = 24;
    const ROUNDS: usize = 50;

    let match_keys = {
        let mut rng = thread_rng();
        const MASK: u64 = u64::MAX >> (64 - NUM_BITS);
        let mut match_keys: Vec<u64> = Vec::new();
        for _ in 0..ROUNDS {
            match_keys.push(rng.gen::<u64>() & MASK);
        }

        let mut shares = [
            Vec::with_capacity(ROUNDS),
            Vec::with_capacity(ROUNDS),
            Vec::with_capacity(ROUNDS),
        ];
        for match_key in match_keys.clone() {
            let share_0 = rng.gen::<u64>() & MASK;
            let share_1 = rng.gen::<u64>() & MASK;
            let share_2 = match_key ^ share_0 ^ share_1;

            shares[0].push((share_0, share_1));
            shares[1].push((share_1, share_2));
            shares[2].push((share_2, share_0));
        }

        shares
    };

    let mut input = (0..ROUNDS).map(|v| Fp32BitPrime::from(v as u128)).collect::<Vec<_>>();
    let expected_sum = input.iter().map(|v| v.as_u128()).sum::<u128>();
    for bit in 0u8..24 {
        let match_keys = &match_keys.clone();
        let [shares0, shares1, shares2] = world.semi_honest(input.clone(), |ctx, shares| async move {
            let my_input = &match_keys[ctx.role()];
            let f = async {
                let mut shares = shares;
                for (i, to_helper) in Role::all().iter().enumerate() {
                    shares = reshare_all_shares(&shares, ctx.narrow(&format!("shuffle{i}-reshare")), *to_helper).await.unwrap();
                }

                Ok(shares)
            };
            let (revealed, conv_) = try_join(
                f,
                convert_shares_for_a_bit(ctx.narrow(&ModulusConversion), my_input, NUM_BITS, bit)
            ).await.unwrap();

            revealed
        }).await;

        input = shares0.into_iter().zip(shares1.into_iter()).zip(shares2.into_iter()).map(|((s0, s1), s2)| {
            validate_and_reconstruct(&s0, &s1, &s2)
        }).collect::<Vec<_>>();
        assert_eq!(expected_sum, input.iter().map(|v| v.as_u128()).sum::<u128>());
    }
}

@akoshelev
Copy link
Collaborator Author

akoshelev commented Nov 25, 2022

Looks like Infra bug again

H2: (3787081308, 754425418), r23 = 754425418, part1 = 3808087248, part2 = 4273961351
H3: (754425418, 4048427856)
H1: (4048427856, 3089946992), r31 = 4048427856, part1 = 3110952932, part2 = 4273961351 

     X1        X2             X2           X3           X3          X1
(4048427856, 3089946992), (3787081308, 754425418), (754425418, 4048427856)

pay attention to part1 variable. part2 matches, but part1 received by H1 is not the the same part1 sent by H2

send side

let part1 = input.left() + input.right() - r1;

receive side

let part1: F = channel

@akoshelev
Copy link
Collaborator Author

Another interesting observation is that I only see record_id = 0 failing spectacularly. DIfferent steps but always record_id = 0

@akoshelev
Copy link
Collaborator Author

Found the root cause. Lets look at the example

Here is the failure

thread 'tests::randomized::reshare_just_reshare' panicked at 'assertion failed: `(left == right)`
  left: `0`,
 right: `2115475503`: Not a valid replicated sharing: (145338368, 2073333811), (4252825599, 4191770615), (4191770615, 145338368)',

Helpers view of the world

H2: (4252825599, 4191770615), r23 = 754425418, part1 = 3234221718, part2 = 1018603881
H3: (4191770615, 145338368)
H1: (145338368, 2073333811), r31 = 4048427856, part1 = 1054729930, part2 = 1018603881 

     X1          X2            X2           X3         X3          X1
(145338368, 2073333811), (4252825599, 4191770615), (4191770615, 145338368)

again, $part1_{H1} \ne part1_{H2}$. Indeed H1 received 1054729930 as record 0

2022-11-25T23:10:48.690996Z TRACE gateway_loop{role=H1}: raw_ipa::helpers::buffers::receive: received RecordId(0)..RecordId(2): [1054729930, 2112537405] from channel[peer=H2,step=step=protocol/run-9/shuffle2-reshare]
2022-11-25T23:10:48.691565Z TRACE gateway_loop{role=H1}: raw_ipa::helpers::buffers::receive: received RecordId(2)..RecordId(4): [1566315217, 3234221718] from channel[peer=H2,step=step=protocol/run-9/shuffle2-reshare]

however that is not the correct order of messages sent from H2.

2022-11-25T23:10:48.689870Z TRACE gateway_loop{role=H2}: raw_ipa::helpers::messaging: new SendRequest(channel[peer=H1,step=step=protocol/run-9/shuffle2-reshare], RecordId(0) = 3234221718
2022-11-25T23:10:48.690078Z TRACE gateway_loop{role=H2}: raw_ipa::helpers::messaging: new SendRequest(channel[peer=H1,step=step=protocol/run-9/shuffle2-reshare], RecordId(1) = 1054729930
2022-11-25T23:10:48.690199Z TRACE gateway_loop{role=H2}: raw_ipa::helpers::messaging: new SendRequest(channel[peer=H1,step=step=protocol/run-9/shuffle2-reshare], RecordId(2) = 2112537405
2022-11-25T23:10:48.690479Z TRACE gateway_loop{role=H2}: raw_ipa::helpers::messaging: new SendRequest(channel[peer=H1,step=step=protocol/run-9/shuffle2-reshare], RecordId(3) = 1566315217

As it can be seen, they've interleaved. And this is why:

let mut pending_sends = FuturesUnordered::new();

@akoshelev
Copy link
Collaborator Author

fixed in #363

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant