Skip to content

Conversation

@mikeconley
Copy link
Contributor

@mikeconley mikeconley commented Nov 5, 2020

Checklist for reviewer:

  • Commits should reference a bug or github issue, if relevant (if a bug is referenced, the pull request should include the bug number in the title)
  • Scan the PR and verify that no changes (particularly to .circleci/config.yml) will cause environment variables (particularly credentials) to be exposed in test logs
  • If the PR comes from a fork, trigger the integration CI test by pushing this revision as discussed in the README and review the report posted in the comments.

For glean changes:

  • Update templates/include/glean/CHANGELOG.md

@auto-assign auto-assign bot requested a review from mreid-moz November 5, 2020 02:43
Copy link
Contributor

@acmiyaguchi acmiyaguchi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately we don't accept versions other than 4 for the telemetry pipeline due to schemas being tightly coupled to the telemetry module. You should be able to replace the placeholder schema with this schema, though, since it's currently a placeholder. We only version schemas when the underlying data is incompatible (e.g. types for properties change, or columns are removed).

@mikeconley mikeconley changed the title Bug 1675103 - Add version 5 of the bhr ping schema. Bug 1675103 - Define the bhr ping schema. Nov 5, 2020
Copy link
Contributor

@acmiyaguchi acmiyaguchi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the schema, there are a couple of changes that will need to be modified so it passes our transpilation process. This document is particularly tricky because of the nested arrays.

I made a few changes to the CI tool locally, and this is the result that I get when dropping the stack traces and adding "additionalItems": false in the appropriate places.

mps bigquery transpile schemas/telemetry/bhr/bhr.4.schema.json | mps bigquery columns /dev/stdin | grep payload
root.payload.hangs.[].annotations.[].f0_ STRING
root.payload.hangs.[].annotations.[].f1_ STRING
root.payload.hangs.[].duration FLOAT64
root.payload.hangs.[].process STRING
root.payload.hangs.[].remote_type STRING
root.payload.hangs.[].runnable_name STRING
root.payload.hangs.[].thread STRING
root.payload.modules.[].f0_ STRING
root.payload.modules.[].f1_ STRING
root.payload.time_since_last_ping INT64

@mikeconley
Copy link
Contributor Author

I think I've correctly applied the review feedback. Let me know if I've missed anything. Thanks!

@acmiyaguchi
Copy link
Contributor

Could I ask you to rebase onto the current master? There are a couple of CI changes in #643 that will be handy during review.

@mikeconley
Copy link
Contributor Author

Sure, done!

@mikeconley
Copy link
Contributor Author

Oops, looks like I'm failing CI. Back to the drawing board.

@acmiyaguchi
Copy link
Contributor

Okay, this looks good to me, the types in the root.payload columns look correct. I'm going to leave it for Monday to merge in so that we can check on the schema error rate on deploy.

I did a quick check on another branch I have for looking at the transformed validation document for more insight into what's going to happen.

In the hangs column will have this repeated structure:

% mps bigquery transform validation/telemetry/bhr.4.sample.pass.json | jq '.payload.hangs[0]'
{
  "duration": 157.073998,
  "thread": "Gecko_Child",
  "process": "tab",
  "annotations": [
    {
      "f0_": "PendingInput",
      "f1_": "709"
    }
  ],
  "runnable_name": "TimeoutExecutor Runnable",
  "remote_type": "webIsolated"
}

The additional_properties field is going to have an JSON blob that contains a the stack object that doesn't make it into the structure portion of the schema, which should have a 1:1 array mapping:

% mps bigquery transform validation/telemetry/bhr.4.sample.pass.json | \
jq '.additional_properties | fromjson | .payload.hangs[0]' 
{
  "stack": [
    "XRE_InitChildProcess",
    "nsTimerImpl::Fire",
    "",
    "(content script)",
    "(unresolved)",
    "(unresolved)",
    "(unresolved)",
    "(unresolved)",
    "(unresolved)",
    "(unresolved)",
    "(unresolved)",
    "(unresolved)",
    "(unresolved)",
    "(unresolved)",
    "(unresolved)",
    "(unresolved)",
    "(unresolved)"
  ]
}

Do you know what you want to do with the stacks columns long term?

{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"title": "event",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good place to add a description field linking out to the bug where this is first implemented. The title should probably be bhr?

@mikeconley
Copy link
Contributor Author

Do you know what you want to do with the stacks columns long term?

Similar to crash stacks, we're probably going to want a way of symbolicating them and generating signatures for them. Something like what's happening in https://bugzilla.mozilla.org/show_bug.cgi?id=1631563 for crash pings.

@acmiyaguchi
Copy link
Contributor

Does this require context from the full hangs array? It might spare a few headaches if stack got moved into the repeated array as a JSON string instead of being housed in the additional_properties field. This is something I can do with the heavy lifting of the schema definition done here.

@mikeconley
Copy link
Contributor Author

Does this require context from the full hangs array?

I'm honestly not sure what you mean. But I certainly trust you more to know about the best way to order this stuff in the databases than I do myself. :)

@acmiyaguchi
Copy link
Contributor

To reframe my question, flattening out the data will look something like this:

SELECT
  id,
  index,
  hangs.*
  JSON_EXTRACT(additional_properties, CONCAT("$.payload.hangs[", index, "].stack")) AS stack
FROM `moz-fx-data-shared-prod`.telemetry.bhr,
UNNEST(payload.hangs) AS hang
WITH OFFSET AS index

In the result from this query (which flattens out the hangs column), is symbolicating the stack just symbolicate(stack) AS stack_symbol for some symbolicate function?

And I'm suggesting a change to https://github.com/mozilla/mozilla-schema-generator to get around some of the hacky stuff required to avoid the JSON_EXTRACT function. I'm convinced it's not a bad idea though, so I'll probably go ahead with it.

@mikeconley
Copy link
Contributor Author

Yeah, that looks like the right kind of query to me!

The symbolication step is still being worked out, I think. @willkg or @wlach might have details on how it could work, but effectively it's a service call-out to transform the memory addresses into readable symbols. I'm not sure whether or not it can be a function like that, or if it'd be a periodic post-processing task.

@willkg
Copy link
Contributor

willkg commented Dec 5, 2020

I don't know how crash ping symbolication and signature generation is going to work, yet. I'm hoping to get to that in early 2021.

@mikeconley
Copy link
Contributor Author

Thanks for the review, @acmiyaguchi! Having never contributed to this repository before, I'm not 100% sure what the procedure is now. Am I still waiting on approval from @mreid-moz? Or do I trigger a landing somehow? Or is there something else I should wait for?

@acmiyaguchi
Copy link
Contributor

I'm going to merge this now, it will get triggered automatically at UTC+2. I didn't merge it on Friday because of the weekend. Check https://protosaur.dev/mps-deploys/ for the actual deploy time. The new data should be available in the new format in telemetry_live.bhr_v4 10-15 minutes after deploy, while telemetry.bhr will be available once a full day has elapsed.

@acmiyaguchi acmiyaguchi removed the request for review from mreid-moz December 7, 2020 18:29
@acmiyaguchi acmiyaguchi merged commit 05f5477 into mozilla-services:master Dec 7, 2020
@mikeconley mikeconley deleted the bhr5 branch December 7, 2020 22:10
acmiyaguchi pushed a commit that referenced this pull request Dec 7, 2020
05f5477	2020-12-07 10:29:18 -0800	Bug 1675103 - Define the bhr ping schema. (#636)
dataops-pipeline-schemas added a commit that referenced this pull request Dec 8, 2020
05f5477	2020-12-07 10:29:18 -0800	Bug 1675103 - Define the bhr ping schema. (#636)
@acmiyaguchi
Copy link
Contributor

Some follow-up, it looks like things deployed correctly.

Data is being inserted into the table correctly: https://sql.telemetry.mozilla.org/queries/76767/source
We're also not seeing any schema validation errors: https://sql.telemetry.mozilla.org/queries/76768/source

If you want to access data before this time, you will need to work with the additional_properties field.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants