Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

include source_app label in StatsD #3645

Merged
merged 20 commits into from
Dec 26, 2019
Merged

include source_app label in StatsD #3645

merged 20 commits into from
Dec 26, 2019

Conversation

omgitsbillryan
Copy link
Contributor

Description of change

Adds a source_app label to the StatsD metrics that vets-api serves to prometheus. This value comes from a header that vets-website passes on each request to vets-api (link to that PR)

Testing done

rspec

Testing planned

  • Discover if nginx (revproxy) automatically passes on the header from the browser onto vets-api

Applies to all PRs

  • Appropriate logging
  • Swagger docs have been updated, if applicable
  • Provide link to originating GitHub issue, or connected to it via ZenHub
  • Does not contain any sensitive information (i.e. PII/credentials/internal URLs/etc., in logging, hardcoded, or in specs)
  • Provide which alerts would indicate a problem with this functionality (if applicable)

@omgitsbillryan omgitsbillryan requested review from a team as code owners December 16, 2019 02:21
@va-vfs-bot va-vfs-bot temporarily deployed to source_app_name/master December 16, 2019 02:34 Inactive
@@ -31,16 +77,24 @@ def call(env)
if path_parameters
controller = path_parameters[:controller]
action = path_parameters[:action]
source_app = env['HTTP_SOURCE_APP_NAME']
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    # We should never use a dynamic path to apply the tag for the instrumentation,
    # since this will permit a rogue actor to increase the number of time series
    # exported from the process and causes instability in the metrics system.

@wyattwalter do you see a problem with this given the ominous comment ☝️ . We do perform a whitelist later on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose my real question is: does this warrant some changes in our prometheus storage config?

I came across a prior postmortem, where we created a new label that had an infinite number of possible values. In practice, I'm not sure where that breaking point is of how many possible values warrant config changes.

Is trying to calculate the storage impact upfront a practical solution? Or is it best to merge and then monitor?

@va-vfs-bot va-vfs-bot temporarily deployed to source_app_name/master December 16, 2019 03:09 Inactive
Copy link
Contributor

@annaswims annaswims left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - I'd still like to hear @wyattwalter's opinion on additional tags before it's merged

verify
veteran-id-card
veteran-representative
vic-v2
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cvalarida I got this list by looking in vets-website for all manifest.json files and finding the line:

"entryName": "foo"

Do you know if this list is complete?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's as complete as I think we can get right now. You can verify this by running yarn apps or (npm run apps). That'll do the same thing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ideally we'd put this into configuration somehow. Either load it from an endpoint on process startup or during the rendering of the deploy configuration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@annaswims annaswims added the VSP VSP Contract label Dec 16, 2019
@@ -3,6 +3,52 @@
class StatsdMiddleware
STATUS_KEY = 'api.rack.request'
DURATION_KEY = 'api.rack.request.duration'
SOURCE_APP_NAMES = %w[
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do we intend for FE engineers to know to keep this list up to date? Can you add a comment, so people in the future know that this should be kept up to date manually and FE engineers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. Some thoughts and stuff:

  • added logic to warn to sentry when an unrecognized Source-App-Name header comes in
  • add instructional comment on how to regenerate
  • if we're out of date on the whitelist it isnt that big of a deal, it will result in some small holes in our metrics

@va-vfs-bot va-vfs-bot temporarily deployed to source_app_name/master December 17, 2019 18:56 Inactive
@va-vfs-bot va-vfs-bot temporarily deployed to source_app_name/master December 17, 2019 20:47 Inactive
@va-vfs-bot va-vfs-bot temporarily deployed to source_app_name/master December 17, 2019 21:24 Inactive
@va-vfs-bot va-vfs-bot temporarily deployed to source_app_name/master December 19, 2019 16:57 Inactive
return nil if source_app.nil?
return source_app if SOURCE_APP_NAMES.include?(source_app)

Raven.capture_message('Unrecognized Source App Request Header',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this send an alert to Sentry? That would create a similar problem if we're generating a job just for that. Though, I suppose we want to know when it happens?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it sends to sentry, and yes we want to know when this happens.

You're poking at the fact that a malicious user could load up our sidekiq queues? Since we use sidekiq enterprise, we could enforce some rate limiting / throttling on this call to Sentry.

Alternatively, we could use the Rails.logger and setup a metric log filter + alert.

I think I'll just remove it altogether for now and create a new ticket since knowing when this happens can happen a bit later.


def instrument_statsd(status, duration, controller, action, source_app)
duration_tags = ["controller:#{controller}", "action:#{action}"]
duration_tags.push("source_app:#{source_app}") unless source_app.nil?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to make sure that every metric sent to statsd with the same name has the same labels too. I'm not sure if null or an empty string is preferable to the library, but they should all have the same "shape" (i.e. labels).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

K. J/w - what happens if we don't?

@va-vfs-bot va-vfs-bot temporarily deployed to source_app_name/master December 20, 2019 19:36 Inactive
@va-vfs-bot va-vfs-bot temporarily deployed to source_app_name/master December 20, 2019 20:41 Inactive
@va-vfs-bot va-vfs-bot temporarily deployed to source_app_name/master December 20, 2019 20:45 Inactive
@va-vfs-bot va-vfs-bot temporarily deployed to source_app_name/master December 20, 2019 21:27 Inactive
@va-vfs-bot va-vfs-bot temporarily deployed to source_app_name/master December 26, 2019 16:09 Inactive
@omgitsbillryan omgitsbillryan merged commit 02a8755 into master Dec 26, 2019
@omgitsbillryan omgitsbillryan deleted the source_app_name branch December 26, 2019 16:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
VSP VSP Contract
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants