Bug 1635893 Implement DecryptAetIdentifiers #1304

jklukas · 2020-06-09T20:45:50Z

Supports Account Ecosystem Telemetry.

This differs somewhat from the design in the open draft docs PR, so those docs will be updated before deploying this.

Supports Account Ecosystem Telemetry. This differs somewhat from the design in the open [draft docs PR](#1264), so those docs will be updated before deploying this.

codecov-commenter · 2020-06-09T20:55:08Z

Codecov Report

Merging #1304 into master will increase coverage by 3.09%.
The diff coverage is 77.08%.

@@             Coverage Diff              @@
##             master    #1304      +/-   ##
============================================
+ Coverage     80.34%   83.44%   +3.09%     
+ Complexity      687      637      -50     
============================================
  Files            93       70      -23     
  Lines          4020     3104     -916     
  Branches        407      322      -85     
============================================
- Hits           3230     2590     -640     
+ Misses          645      386     -259     
+ Partials        145      128      -17

Flag	Coverage Δ	Complexity Δ
#ingestion_beam	`83.44% <77.08%> (-0.35%)`	`637.00 <21.00> (+21.00)`	⬇️
#ingestion_edge	`?`	`?`
#ingestion_sink	`?`	`?`

Impacted Files	Coverage Δ	Complexity Δ
.../com/mozilla/telemetry/decoder/DecoderOptions.java	`100.00% <ø> (ø)`	`3.00 <0.00> (ø)`
...main/java/com/mozilla/telemetry/util/KeyStore.java	`83.05% <33.33%> (-5.63%)`	`11.00 <1.00> (+1.00)`	⬇️
...m/src/main/java/com/mozilla/telemetry/Decoder.java	`92.50% <66.66%> (-7.50%)`	`10.00 <2.00> (+1.00)`	⬇️
...m/mozilla/telemetry/aet/DecryptAetIdentifiers.java	`79.84% <79.84%> (ø)`	`18.00 <18.00> (?)`
...om/mozilla/telemetry/ingestion/core/util/Json.java	`70.73% <0.00%> (-4.88%)`	`17.00% <0.00%> (-8.00%)`
...etry/ingestion/core/util/DerivedAttributesMap.java	`84.00% <0.00%> (-4.00%)`	`11.00% <0.00%> (-1.00%)`
...ava/com/mozilla/telemetry/decoder/Deduplicate.java	`78.21% <0.00%> (-1.00%)`	`4.00% <0.00%> (ø%)`
ingestion-edge/ingestion_edge/publish.py
...ry/ingestion/sink/transform/DecompressPayload.java
...lla/telemetry/ingestion/sink/util/TimedFuture.java
... and 22 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 58d06d6...2ded716. Read the comment docs.

jklukas · 2020-06-09T20:57:02Z

ingestion-beam/src/main/java/com/mozilla/telemetry/aet/DecryptAetIdentifiers.java

+        } else if (STRUCTURED_URI_PATTERN.matcher(uri).matches()) {
+          processStructuredIngestionPayload(json);
+        } else {
+          nonAetDocType.inc();


We pass through the message if it doesn't match any of the expected URI patterns. This shouldn't happen if the edge is routing messages correctly. Perhaps it would be better to err on the side of caution here and raise IllegalAetPayloadException here so the payload gets sanitized and sent to error. Or I suppose a new IllegalAetUriException would be more semantically appropriate.

Sending the message to the error stream may be more actionable to diagnose the problem. Knowing what documents passed through would be useful to determine why the counter goes up. On the other hand, once the document is redacted, it becomes useless for backfill.

I'm going to go ahead and change this to send to errors and redact the payload.

acmiyaguchi

This looks good to me, and mostly straightforward. Is it correct to say that this new pipeline will handle all of the doctypes that contain AET identifiers now, instead of rerouting the pings back into the standard family pipelines?

acmiyaguchi · 2020-06-09T23:46:27Z

ingestion-beam/src/main/resources/account-ecosystem/structured.schema.json

+        "type": "string"
+      }
+    },
+    "ecosystemAnonId": false,


This is to specify that this field must not exist?

That's correct. I wish I could add commentary in the schema to explain this method.

The false schema never validates, so if any of these fields is present, the payload will fail validation. I had previously used a not with required fields, but that only works for a single field. The payload would have to have all of the required fields to trigger. This seemed the most elegant way to define an arbitrary number of forbidden fields.

ingestion-beam/src/main/java/com/mozilla/telemetry/aet/DecryptAetIdentifiers.java

acmiyaguchi · 2020-06-10T00:07:16Z

ingestion-beam/src/main/java/com/mozilla/telemetry/aet/DecryptAetIdentifiers.java

+        } else if (STRUCTURED_URI_PATTERN.matcher(uri).matches()) {
+          processStructuredIngestionPayload(json);
+        } else {
+          nonAetDocType.inc();


Sending the message to the error stream may be more actionable to diagnose the problem. Knowing what documents passed through would be useful to determine why the counter goes up. On the other hand, once the document is redacted, it becomes useless for backfill.

acmiyaguchi · 2020-06-10T00:17:54Z

ingestion-beam/src/main/java/com/mozilla/telemetry/aet/DecryptAetIdentifiers.java

+
+      // Note that we must work with the raw URI because ParseUri can raise errors that would
+      // route payloads containing anon_id values to error output; this transform must be placed
+      // before ParseUri so that we can handle redacting anon_id values.


However, if the values are redacted at this point in the pipeline, it should be fine to pass the uri's into parse uri wholesale?

The input to DecryptAetIdentifiers is the raw payload. The idea is that either it's processed successfully within this transform such that all identifiers are properly decrypted, or we sanitize/drop the payload before passing to error output.

If we ran ParseUri first, we would have to make sure to process the errors from that transform to drop or redact the payload, which spreads responsibility around too much.

I've updated this comment to hopefully be a bit more clear.

jklukas · 2020-06-10T12:50:06Z

Is it correct to say that this new pipeline will handle all of the doctypes that contain AET identifiers now, instead of rerouting the pings back into the standard family pipelines?

I will go ahead and add docs as part of this PR to make sure it's clear. AET is now a pipeline family; the edge will handle routing all AET doctypes to a raw topic for AET. There will be no raw BQ output for this pipeline family and now decryption will happen within the Decoder job.

rfk · 2020-06-22T04:33:33Z

ingestion-beam/src/main/resources/account-ecosystem/telemetry.schema.json

+      },
+      "required": [
+        "ecosystemClientId",
+        "ecosystemDeviceId"


@jklukas I'm lightly embarrassed to be asking this question, but...what's the difference between "client id" and "device id" here?

I couldn't find a definition of "device id" in the "identifiers" spec doc, but it does confusing talk about the device being identified by a "client id".

It's not obvious to me what the difference is here. I'll try to track down the history of docs to see where these two terms were introduced. I'm definitely open to removing ecosystemDeviceId if it's not meaningfully different from client id.

I think we should have just client_id and remove device_id, if that makes sense to you. That's certainly the only such value I had intended on submitting with the first pings from Firefox desktop :-)

I have not been able to track down where ecosystem_device_id came from, so I'm inclined to go ahead and remove it. We can easily mark it as non-required immediately, and then I can follow up about recreating the tables to remove it from the schema completely.

Thanks, SGTM 👍

See discussion in mozilla/gcp-ingestion#1304 (comment)

jklukas force-pushed the aet-poc branch from c1ea1d9 to cfd1418 Compare June 9, 2020 20:48

jklukas requested a review from acmiyaguchi June 9, 2020 20:48

Bug 1635893 Implement DecryptAetIdentifiers

e076fa4

Supports Account Ecosystem Telemetry. This differs somewhat from the design in the open [draft docs PR](#1264), so those docs will be updated before deploying this.

jklukas force-pushed the aet-poc branch from cfd1418 to e076fa4 Compare June 9, 2020 20:51

jklukas commented Jun 9, 2020

View reviewed changes

acmiyaguchi approved these changes Jun 10, 2020

View reviewed changes

Refactor to use MapElements and introduce IllegalAetUriException

b0d73cb

jklukas force-pushed the aet-poc branch from d6ea49d to 35d9f8e Compare June 10, 2020 14:37

Update docs

2ded716

jklukas force-pushed the aet-poc branch from 35d9f8e to 2ded716 Compare June 10, 2020 14:39

jklukas merged commit e5b390c into master Jun 10, 2020

jklukas deleted the aet-poc branch June 10, 2020 14:46

jklukas mentioned this pull request Jun 10, 2020

Draft docs v2 for Account Ecosystem Telemetry #1264

Closed

rfk reviewed Jun 22, 2020

View reviewed changes

jklukas added a commit to mozilla-services/mozilla-pipeline-schemas that referenced this pull request Jun 26, 2020

Make ecosystem_device_id an optional field

eeab531

See discussion in mozilla/gcp-ingestion#1304 (comment)

jklukas mentioned this pull request Jun 26, 2020

Make ecosystem_device_id an optional field mozilla-services/mozilla-pipeline-schemas#566

Merged

4 tasks

jklukas added a commit to mozilla-services/mozilla-pipeline-schemas that referenced this pull request Jun 26, 2020

Make ecosystem_device_id an optional field

9758b29

See discussion in mozilla/gcp-ingestion#1304 (comment)

jklukas added a commit to mozilla-services/mozilla-pipeline-schemas that referenced this pull request Jun 26, 2020

Make ecosystem_device_id an optional field

2cfd30a

See discussion in mozilla/gcp-ingestion#1304 (comment)

Bug 1635893 Implement DecryptAetIdentifiers #1304

Bug 1635893 Implement DecryptAetIdentifiers #1304

Uh oh!

Conversation

jklukas commented Jun 9, 2020

Uh oh!

codecov-commenter commented Jun 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

acmiyaguchi left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jklukas commented Jun 10, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

codecov-commenter commented Jun 9, 2020 •

edited

Loading