-
Notifications
You must be signed in to change notification settings - Fork 97
Bug 1675103 - Define the bhr ping schema. #636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
acmiyaguchi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately we don't accept versions other than 4 for the telemetry pipeline due to schemas being tightly coupled to the telemetry module. You should be able to replace the placeholder schema with this schema, though, since it's currently a placeholder. We only version schemas when the underlying data is incompatible (e.g. types for properties change, or columns are removed).
acmiyaguchi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the schema, there are a couple of changes that will need to be modified so it passes our transpilation process. This document is particularly tricky because of the nested arrays.
I made a few changes to the CI tool locally, and this is the result that I get when dropping the stack traces and adding "additionalItems": false in the appropriate places.
mps bigquery transpile schemas/telemetry/bhr/bhr.4.schema.json | mps bigquery columns /dev/stdin | grep payload
root.payload.hangs.[].annotations.[].f0_ STRING
root.payload.hangs.[].annotations.[].f1_ STRING
root.payload.hangs.[].duration FLOAT64
root.payload.hangs.[].process STRING
root.payload.hangs.[].remote_type STRING
root.payload.hangs.[].runnable_name STRING
root.payload.hangs.[].thread STRING
root.payload.modules.[].f0_ STRING
root.payload.modules.[].f1_ STRING
root.payload.time_since_last_ping INT64
|
I think I've correctly applied the review feedback. Let me know if I've missed anything. Thanks! |
|
Could I ask you to rebase onto the current master? There are a couple of CI changes in #643 that will be handy during review. |
|
Sure, done! |
|
Oops, looks like I'm failing CI. Back to the drawing board. |
|
Okay, this looks good to me, the types in the I did a quick check on another branch I have for looking at the transformed validation document for more insight into what's going to happen. In the hangs column will have this repeated structure: The additional_properties field is going to have an JSON blob that contains a the stack object that doesn't make it into the structure portion of the schema, which should have a 1:1 array mapping: Do you know what you want to do with the stacks columns long term? |
| { | ||
| "$schema": "http://json-schema.org/draft-04/schema#", | ||
| "type": "object", | ||
| "title": "event", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good place to add a description field linking out to the bug where this is first implemented. The title should probably be bhr?
Similar to crash stacks, we're probably going to want a way of symbolicating them and generating signatures for them. Something like what's happening in https://bugzilla.mozilla.org/show_bug.cgi?id=1631563 for crash pings. |
|
Does this require context from the full hangs array? It might spare a few headaches if |
I'm honestly not sure what you mean. But I certainly trust you more to know about the best way to order this stuff in the databases than I do myself. :) |
|
To reframe my question, flattening out the data will look something like this: SELECT
id,
index,
hangs.*
JSON_EXTRACT(additional_properties, CONCAT("$.payload.hangs[", index, "].stack")) AS stack
FROM `moz-fx-data-shared-prod`.telemetry.bhr,
UNNEST(payload.hangs) AS hang
WITH OFFSET AS indexIn the result from this query (which flattens out the And I'm suggesting a change to https://github.com/mozilla/mozilla-schema-generator to get around some of the hacky stuff required to avoid the |
|
Yeah, that looks like the right kind of query to me! The symbolication step is still being worked out, I think. @willkg or @wlach might have details on how it could work, but effectively it's a service call-out to transform the memory addresses into readable symbols. I'm not sure whether or not it can be a function like that, or if it'd be a periodic post-processing task. |
|
I don't know how crash ping symbolication and signature generation is going to work, yet. I'm hoping to get to that in early 2021. |
|
Thanks for the review, @acmiyaguchi! Having never contributed to this repository before, I'm not 100% sure what the procedure is now. Am I still waiting on approval from @mreid-moz? Or do I trigger a landing somehow? Or is there something else I should wait for? |
|
I'm going to merge this now, it will get triggered automatically at UTC+2. I didn't merge it on Friday because of the weekend. Check https://protosaur.dev/mps-deploys/ for the actual deploy time. The new data should be available in the new format in |
|
Some follow-up, it looks like things deployed correctly. Data is being inserted into the table correctly: https://sql.telemetry.mozilla.org/queries/76767/source If you want to access data before this time, you will need to work with the |
Checklist for reviewer:
.circleci/config.yml) will cause environment variables (particularly credentials) to be exposed in test logsintegrationCI test by pushing this revision as discussed in the README and review the report posted in the comments.For glean changes:
templates/include/glean/CHANGELOG.md