Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test encrypted input #1237

Merged
merged 7 commits into from
Aug 28, 2024
Merged

Test encrypted input #1237

merged 7 commits into from
Aug 28, 2024

Conversation

eriktaubeneck
Copy link
Member

For the in-market test (and in general in production), we're constructing 3 files of encrypted data. #1216 built a tool to encrypt, and to test decryption. This PR adds a test which reads in those files and runs a test query based on those encrypted files.

One issue here is that it requires adding another feature (in-memory-infra) to the cli::crypto module, that's only used for testing. Relatedly, it also requires making runner::oprf_ipa::OprfIpaQuery public. I think it might be preferable to move this elsewhere, and I'd love some feedback on where best this test should live to minimize these types of changes.

As for implementing this for actual use (the next step, which I will do in a separate PR), we have two options:

  1. The report_collector binary could read in all 3 files, create the buffers, and upload them as it does today.
  2. We could create a new helper end point / modify the existing /create endpoint so that it takes in a URL of the input data and downloads that.

We will need (2) eventually for sharding, but it's not immediately obvious all that needs to be done to make it happen here. Curious if others have thoughts on which approach to take.

Copy link

codecov bot commented Aug 22, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.62%. Comparing base (fbc56af) to head (836ff97).
Report is 9 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1237      +/-   ##
==========================================
- Coverage   92.82%   92.62%   -0.20%     
==========================================
  Files         200      200              
  Lines       31166    31413     +247     
==========================================
+ Hits        28930    29097     +167     
- Misses       2236     2316      +80     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@andyleiserson
Copy link
Collaborator

I think it makes sense to put this test with the integration tests in ipa-core/tests/helper_networks.rs. It could either be a new test or a modification of one of the existing ones.

The integration tests may currently all be using real HTTP networking rather than the in-memory implementation, but I don't think it has to be that way -- if there's a reason this test needs to use in-memory networking, it would be fine to have some integration tests in that configuration.

(2) We could create a new helper end point / modify the existing /create endpoint so that it takes in a URL of the input data and downloads that.

We will need (2) eventually for sharding, but it's not immediately obvious all that needs to be done to make it happen here. Curious if others have thoughts on which approach to take.

There is some overlap with the data transfer involved in sharding, but I'm not sure it's really interchangeable. I think it would be better to leave a pull-from-URL input mechanism and sharding as future, separate tasks.

@martinthomson
Copy link
Member

The feature usage seems fine to me. You can do a bunch of things with tags if it’s only for tests.

@eriktaubeneck
Copy link
Member Author

I think it makes sense to put this test with the integration tests in ipa-core/tests/helper_networks.rs. It could either be a new test or a modification of one of the existing ones.
The integration tests may currently all be using real HTTP networking rather than the in-memory implementation, but I don't think it has to be that way -- if there's a reason this test needs to use in-memory networking, it would be fine to have some integration tests in that configuration.

Let me move it there and see if it works. There's no hard dependency on using in-memory networking, I was just trying to replicate the test in ipa-core/src/query/runner/oprf_ipa.rs.

There is some overlap with the data transfer involved in sharding, but I'm not sure it's really interchangeable. I think it would be better to leave a pull-from-URL input mechanism and sharding as future, separate tasks.

Sounds good. I'll just work on updating the existing report_collector cli to accept 3 encrypted files.

Comment on lines 632 to 634

// Clippy complaining, but unclear why.
#[allow(clippy::disallowed_methods)]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason, without this, I get the following compile error:

error: use of a disallowed method `futures::future::join_all`
   --> ipa-core/src/test_fixture/world.rs:633:21
    |
633 | /                     join_all(zip(shares, iter::repeat(m_ctx.clone())).enumerate().map(
634 | |                         |(i, (share, m_ctx))| async move {
635 | |                             let record_id = RecordId::from(i);
636 | |                             let m_share = share.upgrade(m_ctx.clone(), record_id).await.unwrap();
...   |
644 | |                         },
645 | |                     ))
    | |______________________^
    |
    = note: We don't have a replacement for this method yet. Consider extending `SeqJoin` trait. (from clippy.toml)
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#disallowed_methods
    = note: `-D clippy::disallowed-methods` implied by `-D warnings`
    = help: to override `-D warnings` add `#[allow(clippy::disallowed_methods)]`

It's not exactly clear why this change triggers this. I do import TestWorld, but I imported that when the file was in ipa-core/cli/crypto.rs as well.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, we disallow the join_all method because we want to steer protocols to use seq_join where appropriate. seq_join is more careful about the number of futures it will spawn.

It's fine to use join_all here, though, because it's only three items, and it's test code. There is a general waiver of the disallowed_methods lint for cfg(test), but not for feature = test-fixture.

So any of the following would be fine:

  • Update the comment to say why it's okay to waive the lint (e.g. // It's just 3 items. appears elsewhere in world.rs).
  • Use join3 instead.
  • Waive disallowed_methods whenever feature = test-fixture, or for all of world.rs, or for all of mod test_fixture.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still somewhat confused what code path all the sudden hits this (I didn't add the join_all.) It didn't seem to come up when I had it previously in the src/cli/crypto.rs file ¯_(ツ)_/¯

I'll update the comment, presumably this will also come up when others use this code path.

Copy link
Collaborator

@andyleiserson andyleiserson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As-is, I don't think this test will run in CI, it probably makes sense to add a config so it does.

For integration tests, similar behavior to the cfg directives in the test file can be configured in Cargo.toml, but just mentioning that as an FYI, I don't see a reason it matters which way it's done here.

@eriktaubeneck
Copy link
Member Author

As-is, I don't think this test will run in CI, it probably makes sense to add a config so it does.

@andyleiserson dumb question, but what config controls this?

@andyleiserson
Copy link
Collaborator

As-is, I don't think this test will run in CI, it probably makes sense to add a config so it does.

@andyleiserson dumb question, but what config controls this?

.github/workflows/check.yml controls what runs in CI (and pre-commit is what gets run before pushing).

@@ -101,6 +101,9 @@ check "Web tests" \
check "Concurrency tests" \
cargo test -p ipa-core --release --features "shuttle multi-threading"

check "Encrypted Input Tests" \
cargo test --test encrypted_input --features "cli test-fixture web-app in-memory-infra"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andyleiserson this runs this specific test file, and maybe that's ok. leaving this out just reruns all ~700 tests which we certainly want to avoid. I couldn't figure out how to just run things in the ipa-core/tests directory. Is this an OK approach?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine to me.

@eriktaubeneck eriktaubeneck merged commit b6276b8 into main Aug 28, 2024
12 checks passed
@eriktaubeneck eriktaubeneck deleted the test-encrypted-input branch August 28, 2024 02:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants