-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] wrong permutation indices when task scheduling reordering occurs #256
Comments
Reveal in isolation works fine cargo test --features shuttle --package raw-ipa --lib tests::randomized::reveal
Compiling raw-ipa v0.1.0 (/Users/koshelev/workspace/raw-ipa)
Finished test [unoptimized + debuginfo] target(s) in 5.34s
Running unittests src/lib.rs (target/debug/deps/raw_ipa-13f7cbe4f34a1d27)
running 1 test
test tests::randomized::reveal ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 104 filtered out; finished in 6.70s this is the test async fn reveal() -> Result<(), Error> {
let mut rng = rand::thread_rng();
let world: TestWorld = make_world(QueryId);
let ctx = make_contexts::<Fp32BitPrime>(&world);
for i in 0..10_u32 {
let secret = rng.gen::<u128>();
let input = Fp32BitPrime::from(secret);
let share = share(input, &mut rng);
let record_id = RecordId::from(i);
let results = try_join_all(vec![
ctx[0].clone().reveal(record_id, &share[0]),
ctx[1].clone().reveal(record_id, &share[1]),
ctx[2].clone().reveal(record_id, &share[2]),
])
.await?;
assert_eq!(input, results[0]);
assert_eq!(input, results[1]);
assert_eq!(input, results[2]);
}
Ok(())
} (edit: used 32bit prime field) |
Infra stays strong and denies my allegations cargo test --features shuttle --package raw-ipa --lib tests::randomized::arithmetic
Compiling raw-ipa v0.1.0 (/Users/koshelev/workspace/raw-ipa)
Finished test [unoptimized + debuginfo] target(s) in 8.44s
Running unittests src/lib.rs (target/debug/deps/raw_ipa-13f7cbe4f34a1d27)
running 1 test
test tests::randomized::arithmetic ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 105 filtered out; finished in 16.86s #[test]
fn arithmetic() {
shuttle::check_random(|| {
shuttle::future::block_on(circuit::arithmetic::<Fp32BitPrime>(100, 5))
}, 100);
} |
Shuffle fails but with a different error cargo test --color=always --features shuttle --package raw-ipa --lib tests::randomized::shuffle
Finished test [unoptimized + debuginfo] target(s) in 1.34s
Running unittests src/lib.rs (target/debug/deps/raw_ipa-13f7cbe4f34a1d27)
running 1 test
2022-11-22T21:12:19.587818Z ERROR generator::gen_impl: set panic inside generator
test tests::randomized::shuffle ... FAILED
failures:
---- tests::randomized::shuffle stdout ----
test panicked in task 'main-thread'
failing schedule: "9103a204ebf9feaf8d81c2f337000204008000600a000cca46c44a6486c84c0082c480606826ac60ac228c60680c828c2a66824224280a406624c422aa4c64acc246ca426cc060c6a4c04046c4000a6aa40cc0a404c064826a06406844422a408c2a2a4628660684442008286c424620c06688662c0266c066c26882682088c0822802c0082028428a240a8082220420a4a8a4a82400a4400a2242a4a004482244a2048c204a44422460028a06400a664c8c04c66a04c2224662060c62a488c6026c604a266c28440a2a08422a686048228a2008848a22246404a0ca4c28c60880a80a8a2a8042c668844c228686a00444a6484c800c2226cc664c026c26046cc60c6c060664a4cca660c0006c6c404acc0a60cc64a6a6660404464a060a"
pass that string to `shuttle::replay` to replay the failure
thread 'tests::randomized::shuffle' panicked at 'assertion failed: `(left == right)`
left: `19_mod31`,
right: `0_mod31`', src/test_fixture/sharing.rs:52:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
failures:
tests::randomized::shuffle
test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 106 filtered out; finished in 0.02s I want to look at |
I don't know why I started looking at test panicked in task 'main-thread'
failing schedule: "91038e14b4c9b7f7f6d3a6fed001000260000408000a000cc820a680c424a0a2424cc04460c666ca480424800624c8a6086644c06ac4626c26880204a42c84286c64a60244c86aacc6240286a6026c8204084866ca060a42ca6204668c60c6602046a4ca0642284686602a286a80c44424608a646ac246c8ca064a44aa4c2c68202a282444a6c6a86c0044c60a22acc8284a2a26a282822604662a864c20a448864a6a04aac68448004c4a4a26a0a02a64ac8a0046ca8c66a48a04a0004cc4064a2222860220880242a64c00a2604ca2c0462a4884a4a246088624c0c24a4060a28286660a204642a42668646aa824c2204284c62422c82004aa2406a0a28842662c400ca640a4c2c842000a20484a6aa44064c68c6a80ca622068404682c644220266022668a8c000c4640ca2228a644824066a6a0cc800266842c822c400286066ac86ca6c062280c262c2408c28a422ca0246a6064084a020066a88402c82c44a62ca042c4a82664c2c464626048ac62a6c0640aac0406044086aa446ca04a2cca82648a044c6644a46222446cc46664ca8a2a462c8682406208c2a0608206ca84442ac406646880202cc6a886ca6268802404a466c822ca640cca6ca2c886420024aca268c008a4440c8482a028668c4200a200c64ca882646a2a6282686844ca68022c0a4a6842442caa2066c28844a84824a6862000a26c04c4a8424a6026a88466a82c02c00a446cc244046a020a604c42c4600626c4a826a8cc020c800466268c4aa04cc62ccaa2626c8626ca4a02a2c64660862c2448446aa44a80ac42420cc000a4404a2a0268a2042a886c2a8c462262666226cc6288800a2ac64606c2a2606804a2c444a42802028a4604aa4822628482c080c8846cca40a2828a4642448844a44a2224822aaa060086046cc6262a6006cac88c26426a2446a8284a00242048c282880282422488aac426660c6c6a2a8c08c268a224406c62ca404c4260ca80806c848640262686a8c46c280a0a626c8c48286626c6620c00cc644aac0622c4c2a082a28660c6c802286cc62428084440626ca644cc04c4a0a04822a0a202648a66a4cc8204022824224aac2884c6a266c4aa22080c0682c2c4668846a2cac40a408886ac26244a620c6028a84c02aca60c0208cc6a026c482ca20ac40648ac628a2c8a42644220404a6a2864cca860ac40022ca08c0c2822606ac2c0a06a28a8cc02088cc02ac00aa28ccc0a200ac00a2ca40cac2a822200804080c028acca2c0a2a440aacc08802c0ccaac0282aaa42ac4a088a20024424cc80880ac80aa2ca044a022caa02848c42a0820480a84aa8c20208248c024828c24aa2262408046068a204caa864246a6c620a4842a8cc2842aa6408c680082048ca0626006a662c8626264a2800c8ca0422a666cc4aac828202022448800c6a2a6288c6466004c66a2248aac2448aa86240248ac4a0862226c002c08666c48804c2828a8800a080a484004c68820a444600c0a6a80c4482ca2a8064480a6280ac4c0886a26a64602220622a46c680a4c04aa6c08468046a62a284c028022266c42a0ca082068600284624c64aac4a2666ac0426200228c4066a8cac6c6428c02cc4820aa062282cc8a00ac2628c4464660cc8a4a6020c4ca64aa42884266a408cc80caa8066ca4a426622c406c688082c08ca0a04004ac2442648a4c4a2084208824c64ac0aa20662aa248202880c0404a2c80a800244462820240c42a64a00248c24ca62a4460264c066028ca0ca8a2684c80c6aaa0042888a42a22a0800442228a48a24a4220a4044880a080a08008080808800008"
pass that string to `shuttle::replay` to replay the failure
Left: 5_mod31
Right: 13_mod31 |
While working on private-attribution#256, I noticed that reshare tests consistently fail concurrency test. It was caused by always using RecordId(0) in a loop that terminates if reshare successfully generates unique shares. When loop needs more than 1 iteration it fails on PRSS uniqueness check. While it was possible to fix that test, I did not really like the probabilistic nature of it and decided to create 2 tests instead. One validates that reshare protocol actually communicates and another one that it is correct. It was also a good opportunity to use new test API implemented by @martinthomson in private-attribution#261. It should be used everywhere. One small change I made to the `TestWorld` to improve ergonomics changes `make_contexts` function to narrow generated contexts to some unique (within this world) step, so the following syntax is made possible ```rust for _ in 0..10 { world.semi_honest(..., |ctx, v| { ... }) }
I found my suspect but still don't have the root cause. Running two futures concurrently causes data corruption. let ctx_bit = ctx.narrow(&Sort(bit_num));
let ((shuffled_compose_permutation, random_permutations_for_shuffle), bit_i) = try_join(
shuffle_and_reveal_permutation(
ctx_bit.narrow(&ShuffleRevealPermutation),
input_len,
composed_less_significant_bits_permutation,
),
convert_shares_for_a_bit(ctx_bit.narrow(&ModulusConversion), input, num_bits, bit_num),
)
.await?; if I remove I also wrote a simpler test that drives shuffle and convert in parallel and it shows the same symptom. I have a chance to validate shares produced by shuffle and they are corrupted
clearly helper 1's view of X2 does not match helper 2's view. |
got some more samples of data corruption. Interestingly they all have
|
Narrowed down the issue to reshare. Here is the test that reproduces it (forgive the copy paste and syntax) #[test]
fn reshare_just_reshare() {
shuttle::check_random(
|| {
shuttle::future::block_on(reshare_just_reshare_inner());
},
100,
);
}
async fn reshare_just_reshare_inner() {
let world = make_world(QueryId);
const NUM_BITS: u8 = 24;
const ROUNDS: usize = 50;
let match_keys = {
let mut rng = thread_rng();
const MASK: u64 = u64::MAX >> (64 - NUM_BITS);
let mut match_keys: Vec<u64> = Vec::new();
for _ in 0..ROUNDS {
match_keys.push(rng.gen::<u64>() & MASK);
}
let mut shares = [
Vec::with_capacity(ROUNDS),
Vec::with_capacity(ROUNDS),
Vec::with_capacity(ROUNDS),
];
for match_key in match_keys.clone() {
let share_0 = rng.gen::<u64>() & MASK;
let share_1 = rng.gen::<u64>() & MASK;
let share_2 = match_key ^ share_0 ^ share_1;
shares[0].push((share_0, share_1));
shares[1].push((share_1, share_2));
shares[2].push((share_2, share_0));
}
shares
};
let mut input = (0..ROUNDS).map(|v| Fp32BitPrime::from(v as u128)).collect::<Vec<_>>();
let expected_sum = input.iter().map(|v| v.as_u128()).sum::<u128>();
for bit in 0u8..24 {
let match_keys = &match_keys.clone();
let [shares0, shares1, shares2] = world.semi_honest(input.clone(), |ctx, shares| async move {
let my_input = &match_keys[ctx.role()];
let f = async {
let mut shares = shares;
for (i, to_helper) in Role::all().iter().enumerate() {
shares = reshare_all_shares(&shares, ctx.narrow(&format!("shuffle{i}-reshare")), *to_helper).await.unwrap();
}
Ok(shares)
};
let (revealed, conv_) = try_join(
f,
convert_shares_for_a_bit(ctx.narrow(&ModulusConversion), my_input, NUM_BITS, bit)
).await.unwrap();
revealed
}).await;
input = shares0.into_iter().zip(shares1.into_iter()).zip(shares2.into_iter()).map(|((s0, s1), s2)| {
validate_and_reconstruct(&s0, &s1, &s2)
}).collect::<Vec<_>>();
assert_eq!(expected_sum, input.iter().map(|v| v.as_u128()).sum::<u128>());
}
} |
Looks like Infra bug again
pay attention to send side ipa/src/protocol/sort/reshare.rs Line 60 in f0b9ac8
receive side ipa/src/protocol/sort/reshare.rs Line 80 in f0b9ac8
|
Another interesting observation is that I only see record_id = 0 failing spectacularly. DIfferent steps but always record_id = 0 |
Found the root cause. Lets look at the example Here is the failure thread 'tests::randomized::reshare_just_reshare' panicked at 'assertion failed: `(left == right)`
left: `0`,
right: `2115475503`: Not a valid replicated sharing: (145338368, 2073333811), (4252825599, 4191770615), (4191770615, 145338368)', Helpers view of the world
again, 2022-11-25T23:10:48.690996Z TRACE gateway_loop{role=H1}: raw_ipa::helpers::buffers::receive: received RecordId(0)..RecordId(2): [1054729930, 2112537405] from channel[peer=H2,step=step=protocol/run-9/shuffle2-reshare]
2022-11-25T23:10:48.691565Z TRACE gateway_loop{role=H1}: raw_ipa::helpers::buffers::receive: received RecordId(2)..RecordId(4): [1566315217, 3234221718] from channel[peer=H2,step=step=protocol/run-9/shuffle2-reshare] however that is not the correct order of messages sent from 2022-11-25T23:10:48.689870Z TRACE gateway_loop{role=H2}: raw_ipa::helpers::messaging: new SendRequest(channel[peer=H1,step=step=protocol/run-9/shuffle2-reshare], RecordId(0) = 3234221718
2022-11-25T23:10:48.690078Z TRACE gateway_loop{role=H2}: raw_ipa::helpers::messaging: new SendRequest(channel[peer=H1,step=step=protocol/run-9/shuffle2-reshare], RecordId(1) = 1054729930
2022-11-25T23:10:48.690199Z TRACE gateway_loop{role=H2}: raw_ipa::helpers::messaging: new SendRequest(channel[peer=H1,step=step=protocol/run-9/shuffle2-reshare], RecordId(2) = 2112537405
2022-11-25T23:10:48.690479Z TRACE gateway_loop{role=H2}: raw_ipa::helpers::messaging: new SendRequest(channel[peer=H1,step=step=protocol/run-9/shuffle2-reshare], RecordId(3) = 1566315217 As it can be seen, they've interleaved. And this is why: ipa/src/test_fixture/network.rs Line 89 in 3c15686
|
fixed in #363 |
Running concurrency test on sort revealed that we have a reordering issue somewhere.
fails with
Permutation slice contains some weird values
The text was updated successfully, but these errors were encountered: