Bug 1746375 - Dynamically add metric constructors to metric map #1065

brizental · 2022-01-03T17:19:34Z

Sending this in as a draft because I would like to get some feedback on the approach before moving on. More details on the commit messages, but this is a part of attempting to make the Glean.js bundle size smaller.

The changes do not make as a drastic a dent in the size of the bundle as #1051, however these changes are guards against making the bundle much larger in the future. As we add more metric types the bundle wouldn't get bigger.

My main concern is outlined in aaa1ce0.

I am not entirely sure about this change, because it means that if a
given metric is in the database, but was not added to the metric map
(think if a metric is removed from the code, but has not been collected
in a ping once the new code lands) it will be read as invalid by Glean.

This could cause some surprising behaviour for users. My question is: are the benefits of a smaller bundle worth this kinda weird, but hopefully tiny cornercase?

I have also filed a bug so we can instrument whenever we find invalid data in the database so we can know if this is really tiny or not.

Note that 7b5d7a2 is an attempt to make the cornercase not as big.

Pull Request checklist

Quality: Make sure this PR builds and runs cleanly.
- Inside the glean/ folder, run:
  - npm run test Runs all tests
  - npm run lint Runs all linters
Tests: This PR includes thorough tests or an explanation of why it does not
Changelog: This PR includes a changelog entry to CHANGELOG.md or an explanation of why it does not need one
Documentation: This PR includes documentation changes, an explanation of why it does not need that or a follow-up bug has been filed to do that work
- https://bugzilla.mozilla.org/show_bug.cgi?id=1748475

moz-glean · 2022-01-05T10:04:12Z

Build size report

Merging #1065 into main will:

Leave the size of full Web Extension bundle unchanged.
Leave the size of full Website bundle unchanged.
Leave the size of full Node.js bundle unchanged.
Leave the size of full Qt/QML bundle unchanged.

	Current size	New size	Size increase
Web Extension
core only	55 KB	54 KB	📉 1.9 KB
full bundle	82 KB	82 KB	📉 102 bytes
Website
core only	56 KB	54 KB	📉 1.9 KB
full bundle	83 KB	83 KB	📉 100 bytes
Node.js
core only	54 KB	52 KB	📉 1.9 KB
full bundle	87 KB	87 KB	📉 97 bytes
Qt/QML
core only	68 KB	68 KB	📈 291 bytes
full bundle	68 KB	68 KB	📈 291 bytes

badboy

How does the final bundle actually get built? Is this under control of the user embedding Glean?
Will this be determined by the metric types used in metrics.yaml?

It might be worth listing out when we lose what data.
E.g. something like

Version A: initial. metric types [string, bool, datetime]. Records all 3. No ping sent.
Version A+1: removed datetime. metric types [string, bool]. Sends ping, contains string and bool, old datetime is deleted from the database.

(Am I correct with that example?)

My first reaction is: I think this is fine then. Version A+1 wouldn't record a new value. In the current model we would get a value from a previous run, but no new ones anyway.
Losing that one-time stale value doesn't seem to dramatic.

glean/src/core/context.ts

glean/src/core/glean.ts

brizental · 2022-01-05T12:58:43Z

How does the final bundle actually get built? Is this under control of the user embedding Glean?

So in the benchmarks example we are using webpack. We are not in control of how the user builds it, but the most common way to use Glean would be to use a bundler such as webpack or rollup.

We also don't provide a pre-built library, so I would say that might be the only way to use it.

Will this be determined by the metric types used in metrics.yaml?

Exactly. The glean_parser generated template imports the metric types defined in the metrics.yaml and instantiates the metric. With this change, unless a given metric type's code is imported, none of it is added to the final bundle. Which was not the case before, because we were always importing at least the Metric part used for database deserialization.

It might be worth listing out when we lose what data.
E.g. something like

Version A: initial. metric types [string, bool, datetime]. Records all 3. No ping sent.
Version A+1: removed datetime. metric types [string, bool]. Sends ping, contains string and bool, old datetime is deleted from the database.

Perfect, this is right.

(Am I correct with that example?)

My first reaction is: I think this is fine then. Version A+1 wouldn't record a new value. In the current model we would get a value from a previous run, but no new ones anyway.
Losing that one-time stale value doesn't seem to dramatic.

Yeah I tend to agree with this, but I am working on assumptions here. I was wondering if you have seen such a pattern in other places.

brizental · 2022-01-05T13:44:26Z

Version A: initial. metric types [string, bool, datetime]. Records all 3. No ping sent.
Version A+1: removed datetime. metric types [string, bool]. Sends ping, contains string and bool, old datetime is deleted from the database.

Actually in this case it would still report the datetime, because that metric is used internally by Glean and thus always added to the metric map. That is what I think is confusing about all this. It will only really apply to metrics not used internally by Glean. It also doesn't apply to Events, because they are not recorded on the metrics database.

I guess docs help here though.

badboy · 2022-01-05T13:54:31Z

Version A: initial. metric types [string, bool, datetime]. Records all 3. No ping sent.
Version A+1: removed datetime. metric types [string, bool]. Sends ping, contains string and bool, old datetime is deleted from the database.

Actually in this case it would still report the datetime, because that metric is used internally by Glean and thus always added to the metric map.

ehehehe, pick a metric type that's not used by Glean then.

Will this be determined by the metric types used in metrics.yaml?

Exactly. The glean_parser generated template imports the metric types defined in the metrics.yaml and instantiates the metric. With this change, unless a given metric type's code is imported, none of it is added to the final bundle. Which was not the case before, because we were always importing at least the Metric part used for database deserialization.

Bear with me, I really haven't used JS tooling in a long time. Trying to understand how a JS app is shipped.

glean is a dependency (as per package.json) and all its files are somewhere in node_modules
glean_parser is invoked by the user at some point, generating a metrics.js (or similar). That file has all the import { UrlMetric } from "./types/url.js";, etc. lines in there
webpack/rollup/whatever-bundler goes through all referenced files to bundle them up. (Assuming no text metric) because ./types/text.js is not imported, webpack/* will not include it at all. It finally generates one big bundle.js with all the other code.
bundle.js is what's loaded in the final application/website/webext

Is my understanding correct?
If so, then removing these lines are what is the "important" part I guess.

(Am I correct with that example?)
My first reaction is: I think this is fine then. Version A+1 wouldn't record a new value. In the current model we would get a value from a previous run, but no new ones anyway.
Losing that one-time stale value doesn't seem to dramatic.

Yeah I tend to agree with this, but I am working on assumptions here. I was wondering if you have seen such a pattern in other places.

Not really. glean-core always has all metrics (and we can't ever remove metric types really because our database serialization depends on it).
So for once-valid data in the database we can always turn that into JSON.
However for data we can't deserialize we simply skip it.
We don't even delete such data.
There should be no code path where we write data that we can't later decode, but these are files on disk afterall, so some external process could mess with the data. We have no numbers on whether that ever happens really.

brizental · 2022-01-05T15:20:37Z

Bear with me, I really haven't used JS tooling in a long time. Trying to understand how a JS app is shipped.

1. `glean` is a dependency (as per package.json) and all its files are somewhere in `node_modules`

2. `glean_parser` is invoked by the user at some point, generating a `metrics.js` (or similar). That file has all the `import { UrlMetric } from "./types/url.js";`, etc. lines in there

3. webpack/rollup/whatever-bundler goes through all referenced files to bundle them up. (Assuming no text metric) because `./types/text.js` is not `import`ed, webpack/* will not include it at all. It finally generates one big `bundle.js` with all the other code.

4. `bundle.js` is what's loaded in the final application/website/webext

Is my understanding correct? If so, then removing these lines are what is the "important" part I guess.

No bearing required, your understanding is perfect. Yes, that is the crucial part of this PR.

(Am I correct with that example?)
My first reaction is: I think this is fine then. Version A+1 wouldn't record a new value. In the current model we would get a value from a previous run, but no new ones anyway.
Losing that one-time stale value doesn't seem to dramatic.

Yeah I tend to agree with this, but I am working on assumptions here. I was wondering if you have seen such a pattern in other places.

Not really. glean-core always has all metrics (and we can't ever remove metric types really because our database serialization depends on it). So for once-valid data in the database we can always turn that into JSON. However for data we can't deserialize we simply skip it. We don't even delete such data.

Because it is just so damn easy for people to mess with the database data in the browser I think I'll stick to deleting the wrongly formatted data for now. There is a bug to think about this behaviour and it's problems more deeply in the future. (Fun fact: this is the oldest bug in Glean.js' backlog!).

It might be worth instrumenting it in the future though. I'll keep that in mind.

There should be no code path where we write data that we can't later decode, but these are files on disk afterall, so some external process could mess with the data. We have no numbers on whether that ever happens really.

In this PR I essentially add a possible code path for users to write data the database can't later decode. But from what I understand we are alright with that.

Anyways, thanks a lot for the thorough discussion. I'll clean this up and promote it to a non-draft PR. I'll make sure to add some docs here or in a later PR capturing this discussion.

badboy · 2022-01-07T10:12:30Z

Because it is just so damn easy for people to mess with the database data in the browser I think I'll stick to deleting the wrongly formatted data for now. There is a bug to think about this behaviour and it's problems more deeply in the future. (Fun fact: this is the oldest bug in Glean.js' backlog!).

yeah, I think the fact that glean core is not deleting data is a bug.

Anyways, thanks a lot for the thorough discussion. I'll clean this up and promote it to a non-draft PR. I'll make sure to add some docs here or in a later PR capturing this discussion.

Agreed. Docs can help here, but I also think it's not even that surprising to users: you delete your metrics -> data's gone.

This is nice because it means metrics will work exactly like plugins in the sense that they are only added to the bundle if they are used. I am not entirely sure about this change, because it means that if a given metric is in the database, but was not added to the metric map (think if a metric is removed from the code, but has not been collected in a ping once the new code lands) it will be read as invalid by Glean.

Previous behaviour was to delete the whole ping when that happened. This is a nice change either way, we should always delete as little data as possible. However this becomes even more necessary if we decide to go ahead with the changes in the previous commit.

Co-authored-by: Jan-Erik Rediger <badboy@archlinux.us>

MUST REBASE DEPENDABOT PRS ALWAYS

CHANGELOG.md

Co-authored-by: Jan-Erik Rediger <badboy@archlinux.us>

brizental requested a review from badboy January 3, 2022 17:19

brizental mentioned this pull request Jan 4, 2022

Bug 1746375 - Expose testResetGlean on a standalone endpoint #1071

Closed

4 tasks

brizental force-pushed the 1746375-small-metric-map branch from 7b5d7a2 to 5f0bced Compare January 4, 2022 16:50

mozilla deleted a comment from moz-glean Jan 4, 2022

badboy reviewed Jan 5, 2022

View reviewed changes

glean/src/core/context.ts Outdated Show resolved Hide resolved

glean/src/core/context.ts Outdated Show resolved Hide resolved

glean/src/core/context.ts Outdated Show resolved Hide resolved

glean/src/core/glean.ts Outdated Show resolved Hide resolved

brizental force-pushed the 1746375-small-metric-map branch from 5f0bced to 4278377 Compare January 6, 2022 14:20

brizental marked this pull request as ready for review January 6, 2022 14:30

auto-assign bot requested a review from badboy January 6, 2022 14:30

brizental and others added 5 commits January 11, 2022 10:25

Apply suggestions from code review

abab1ab

Co-authored-by: Jan-Erik Rediger <badboy@archlinux.us>

ctn -> ctor

637da99

Update CHANGELOG.md

9ba21fc

brizental force-pushed the 1746375-small-metric-map branch from 8ab83d0 to 9ba21fc Compare January 11, 2022 09:26

EXTRA: Update package-lock.json

de80496

MUST REBASE DEPENDABOT PRS ALWAYS

brizental changed the title ~~Bug 1746375 - Dinamically add metric constructors to metric map~~ Bug 1746375 - Dynamically add metric constructors to metric map Jan 11, 2022

badboy approved these changes Jan 11, 2022

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

Update CHANGELOG.md

c20c384

Co-authored-by: Jan-Erik Rediger <badboy@archlinux.us>

brizental merged commit 8e73cbb into mozilla:main Jan 11, 2022

brizental deleted the 1746375-small-metric-map branch January 11, 2022 11:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug 1746375 - Dynamically add metric constructors to metric map #1065

Bug 1746375 - Dynamically add metric constructors to metric map #1065

Uh oh!

brizental commented Jan 3, 2022 •

edited

Loading

Uh oh!

moz-glean commented Jan 5, 2022

Uh oh!

badboy left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

brizental commented Jan 5, 2022

Uh oh!

brizental commented Jan 5, 2022 •

edited

Loading

Uh oh!

badboy commented Jan 5, 2022

Uh oh!

brizental commented Jan 5, 2022

Uh oh!

badboy commented Jan 7, 2022

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Bug 1746375 - Dynamically add metric constructors to metric map #1065

Bug 1746375 - Dynamically add metric constructors to metric map #1065

Uh oh!

Conversation

brizental commented Jan 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request checklist

Uh oh!

moz-glean commented Jan 5, 2022

Build size report

Uh oh!

badboy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

brizental commented Jan 5, 2022

Uh oh!

brizental commented Jan 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

badboy commented Jan 5, 2022

Uh oh!

brizental commented Jan 5, 2022

Uh oh!

badboy commented Jan 7, 2022

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

brizental commented Jan 3, 2022 •

edited

Loading

brizental commented Jan 5, 2022 •

edited

Loading