-
Notifications
You must be signed in to change notification settings - Fork 527
SNOW-2171791: Add platform telemetry #2387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SNOW-2171791: Add platform telemetry #2387
Conversation
…and I don't want it to block the initialization
…ion checks because it throws internal exceptions now instead of requests library exceptions
…a background thread and switched to executing the checks in parallel for the internal metadata network calls
… will get sent to login-request and eventually persisted in the SessionDPO
|
All contributors have signed the CLA ✍️ ✅ |
|
I have read the CLA Document and I hereby sign the CLA |
…meout, set new default timeout to 0.2s, added new integration test to test user parameter, updated unit tests and added one unit test to test configuring the timeout
…s of azure functions or azure VM for speed. It does extra work but since we parallelize we would rather not have two I/O options forced to be serialized one after another.
…ave to make sure the function does not affect main driver functionality
… unnecessary mock
… wif. slight adjustments to platform detection for URL fixes/header fixes that don't affect logic but make checks in the the csphelper more consistent
…ronment variables and can be independent of the platfrom it is running on. This helps with the GitHub actions test since the test is actually run on GitHub actions.
… additively adds environments for every fake environment
…each test as well as an additional safety measure
sfc-gh-pmansour
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Sid!
test/unit/test_detect_platforms.py
Outdated
| def test_custom_timeout_parameter( | ||
| self, mock_requests, mock_boto3, mock_imds_fetcher | ||
| ): | ||
| mock_imds_instance = Mock() | ||
| mock_imds_instance._get_request.side_effect = Exception("No IMDS") | ||
| mock_imds_fetcher.return_value = mock_imds_instance | ||
|
|
||
| mock_boto3.client.return_value.get_caller_identity.side_effect = Exception( | ||
| "No AWS" | ||
| ) | ||
|
|
||
| mock_requests.get.side_effect = requests.RequestException("No metadata") | ||
| mock_requests.RequestException = requests.RequestException | ||
| mock_requests.Timeout = requests.Timeout | ||
|
|
||
| result = detect_platforms(timeout=5.0) | ||
|
|
||
| assert "is_aws_lambda" in result | ||
| assert "is_github_action" in result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do think there's value in having a test that ensures the timeout is respected and the entire method finishes within that (+ some minor error margin).
Currently, if you have a bug that does everything in serial (rather than parallel), or that forgets to pass the timeout, or that passes the wrong timeout variable, you have no way to detect it. In fact, I believe you previously had such a bug (with the azure managed identity), and may currently have another one (with the 10 executors rather than 11), but there are no tests to catch it.
| raise requests.HTTPError(f"HTTP {self.status_code} Error") | ||
|
|
||
|
|
||
| class TestDetectPlatforms: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally it's good to keep tests focused and test behaviors, not methods, with each test verifying a specific behavior. Otherwise, if you introduce a bug, you'll have to wade through 100 failing tests and try to figure out what went wrong, rather than just 1.
Anyway, I think the negative test coverage you have here is probably sufficient for now. Thanks!
…et a field for the type of exception to be thrown. Then, the platform detection tests has a fixture to set this field to RequestException for tests
…ent most of them. Updated GCE cloud run service/job to inherit GCEFakeMetadaService to get its functionality as well
…t instead of reusing the FakeAwsEnvironment class
…with fake issuer and instead just use a flag to raise a timeout
…t exist. Don't want to fail everything there due to Attribute Error. Updated test to include a parallel timing test
… IMDS fetcher could take in timeouts
sfc-gh-pmansour
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for all your patience!
Would be good to get @sfc-gh-mmishchenko's LGTM too
Please answer these questions before submitting your pull requests. Thanks!
What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR.
SNOW-2171791
Fill out the following pre-review checklist:
Please describe how your code solves the related issue.
This PR adds platform detection mechanisms and appends them to the CLIENT_ENVIRONMENT part of the base_auth_data(). This then gets sent to the /login-request endpoint which serializes and stores this data for future use in the SessionDPO.
To avoid bottlenecking the authentication flow, we have parallelized all platform detection I/O calls with a default timeout of 200 milliseconds. Overall testing has found that these tests take on the order of 1-3 milliseconds with more details provided in the design doc. We have also exposed a user parameter so that they can configure how long of a timeout they are ok with.
Design Doc