Skip to content

Conversation

@jklukas
Copy link
Contributor

@jklukas jklukas commented Jul 22, 2020

@jklukas jklukas requested review from acmiyaguchi, relud and whd July 22, 2020 19:40
@codecov-commenter
Copy link

codecov-commenter commented Jul 22, 2020

Codecov Report

Merging #1323 into master will decrease coverage by 1.70%.
The diff coverage is 90.90%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #1323      +/-   ##
============================================
- Coverage     84.71%   83.00%   -1.71%     
+ Complexity      666      610      -56     
============================================
  Files            94       73      -21     
  Lines          4042     3277     -765     
  Branches        385      352      -33     
============================================
- Hits           3424     2720     -704     
+ Misses          488      439      -49     
+ Partials        130      118      -12     
Flag Coverage Δ Complexity Δ
#ingestion_beam 83.00% <90.90%> (-0.01%) 610.00 <5.00> (+4.00) ⬇️
#ingestion_edge ? ?
#ingestion_sink ? ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ Complexity Δ
...om/mozilla/telemetry/decoder/RerouteDocuments.java 87.50% <87.50%> (ø) 4.00 <4.00> (?)
...m/src/main/java/com/mozilla/telemetry/Decoder.java 92.85% <100.00%> (+0.35%) 10.00 <1.00> (ø)
...etry/ingestion/core/util/DerivedAttributesMap.java 84.00% <0.00%> (-4.00%) 11.00% <0.00%> (-1.00%)
...va/com/mozilla/telemetry/decoder/GeoIspLookup.java 89.47% <0.00%> (-1.76%) 6.00% <0.00%> (ø%)
...a/com/mozilla/telemetry/ingestion/sink/io/Gcs.java
ingestion-edge/ingestion_edge/util.py
.../telemetry/ingestion/sink/util/BatchException.java
...la/telemetry/ingestion/sink/config/SinkConfig.java
ingestion-edge/ingestion_edge/config.py
...metry/ingestion/sink/transform/BlobIdToString.java
... and 17 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8782555...519b4e0. Read the comment docs.

@whd
Copy link
Member

whd commented Jul 22, 2020

My expectation (for the future mechanism, not for this PR) was that the json schema from the original namespace would be used for decode, and the decoder would apply rerouting based on metadata only on output; separately MSG would only generate the bq schema for the rerouted dataset. At least to me this seems like the least sanity-questioning approach. It occurs to me now that that might break some metadata extraction in the bq provisioning pipeline (that assumes certain conventions about pathing between bq and json schemas), but in any case this PR appears to do the right thing to me, if not in the way I was expecting.

@jklukas
Copy link
Contributor Author

jklukas commented Jul 22, 2020

My expectation (for the future mechanism, not for this PR) was that the json schema from the original namespace would be used for decode, and the decoder would apply rerouting based on metadata only on output; separately MSG would only generate the bq schema for the rerouted dataset. At least to me this seems like the least sanity-questioning approach. It occurs to me now that that might break some metadata extraction in the bq provisioning pipeline (that assumes certain conventions about pathing between bq and json schemas), but in any case this PR appears to do the right thing to me, if not in the way I was expecting.

I agree it would be better to reroute only at the BQ sink step. That, however, would require more complex logic changes since BQ loading can happen either via ingestion-sink for the streaming pipeline or via ingestion-sink in the batch reprocessing case.

I also had imagined that once we have schema metadata in place, we would set bq_namespace, etc. as attributes when we get to ParsePayload in the decoder, since that's the point in the pipeline where we are already looking up JSON schemas. It would be the cleaner solution code-wise, but misses out on the advantage of applying the routing logic as late as possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants