-
-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ref(MDC): Topics Enum to Class (with fix) #3268
Conversation
Codecov ReportBase: 92.33% // Head: 92.28% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## master #3268 +/- ##
==========================================
- Coverage 92.33% 92.28% -0.05%
==========================================
Files 697 698 +1
Lines 31640 31634 -6
==========================================
- Hits 29214 29195 -19
- Misses 2426 2439 +13
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
Topic.SNUBA_METRICS: {"message.timestamp.type": "LogAppendTime"}, | ||
Topic.PROCESSED_PROFILES: {"message.timestamp.type": "LogAppendTime"}, | ||
Topic.INGEST_REPLAY_EVENTS: {"message.timestamp.type": "LogAppendTime"}, | ||
Topic.SNUBA_GENERIC_METRICS: {"message.timestamp.type": "LogAppendTime"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dp you plan to move this somehow to config ?
This setting is required for any storage we want to run alerts on. So that would mean we would still have to make changes to Snuba code to create a new dataset that has subscriptions enabled.
I think you have two options.
- move these creation settings to config.
- make this setting applied automatically when creating a topic that has subscriptions associated.
Though I think this opens another problem: should the dataset definition carry enough information to create all topics the dataset needs or should the config be split between the topic name (defined in the dataset) and the environment specific settings to create a topic (like partitions number and retention)?
I think the environment parameters should not be together with the dataset config. The dataset is the schema, it is part of the product.
@lynnagara what do you think ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify you suggestions?
- Have the Kafka settings in the configuration.
topic:
- name: snuba-metrics
- settings:
- message.timestamp.type: LogAppendTime
- Make the setting applied automatically:
topic:
- name: snuba-metrics
- subscriptions: yes
and then
if config["subscriptions"] == "yes":
kafka_settings["message.timestamp.type"] = "LogAppendTime"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Context:
The way snuba manages Kafka topic relies a lot on them being in an enum. This was all meant and designed long before we decided to move dataset in config. So at this point they suffer of all the same limitations and constraints StorageSetKey
have plus others.
Like StorageSetKey
, topics are a logical view on an environment specific resource. As such, part of them are part of the application (the logical name) and part of that needs to be overridden per environment (name and assignment to broker).
Contrarily to StorageSetKey
, Topics are a step ahead as they carry the configuration that should be applied when the topic is created. We do not have anything that automatically configure a Clickhouse cluster when it is created.
The topic creation configuration contains some fields that are actually environment specific (like partitions and retention) and other that the application depends on (like LogAppendTime
. Subscriptions cannot work without).
So it seems that treating all creation settings as environment settings would be wrong and treating all of them as application specific would be wrong.
To solve this problem without requiring anything to be hardcoded when a new dataset is added, I think we should allow the following:
- a basic sets of settings that are applied based on the fact that the storage has subscriptions. So something like your option 2. The dataset config says whether the storage has subscriptions and the topic is created with that flag on. This does not solve it for existing topics when we activate subscriptions after the fact but I believe this is not solve today either.
- an additional set of settings to override parameters for the specific environment. This is not needed in this PR though as it is not supported today either.
Or, in alternative, just always create topics with LogAppendTime
@lynnagara what do you think about always creating topics that way ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's very important for topics to be "registered" (in some form or another) in a single place. I think Evan's option 1 is the best of the proposed ideas here and by far the closest to how it already works today.
Why I think the other options should not be pursued:
- Firstly the
subscriptions: yes
option. Topic configuration should not know about high level Snuba concepts. One day this may actually move outside Snuba as kafka is how different services talk to each other. The only reason we put it inside Snuba is because nothing like that exists today and it was the easiest thing to do. - About putting
LogAppendTime
everywhere. Well, this solves our immediate problem but closes the door to all other custom configuration. This is inflexible and seems like it would be a pain on the day we need anything else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now, if we want to do things like validate LogAppendTime is there when a topic is chosen for subscriptions, we can still do that inside the storage building logic. But it should not be the job of the topics definition to validate Snuba specific concepts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds like there are two different perspectives on how to solve this problem. Whether or not we decide on a solution here, would it make sense for this to be moved to some documentation and shared for visibility? For the scope of this PR, the function of the Topic class is the same as the Enum. There is another PR which depends on some of the register_topic()
functionality implemented here.
Since as mentioned, logical Topics still carry configuration that is applied in code. What I'm proposing moving forward:
- I'll write up documentation to capture this discussion and share it for visibility
- We can more formally decide on which option to pursue
- This will be a separate task that gets carried out during tech debt week
Does that sound sensible? Or does it make more sense for this PR to contain these changes? @evanh @lynnagara @fpacifici
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, let's do that @enochtangg . Thank you.
I think there are two underlying questions that this is touching on: How much of the Topic
settings is logical and belongs in configuration, and how much is environment specific? Once we figure that out, where do the environment specific settings go?
|
||
|
||
def register_topic(key: str) -> Topic: | ||
_REGISTERED_TOPICS[key.upper().replace("-", "_")] = key.lower() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: assuming this input is coming from the cli, you may also want to call .strip()
to remove whitespace
Bumps [urllib3](https://github.com/urllib3/urllib3) from 1.26.12 to 2.2.2. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/urllib3/urllib3/releases">urllib3's releases</a>.</em></p> <blockquote> <h2>2.2.2</h2> <h2>🚀 urllib3 is fundraising for HTTP/2 support</h2> <p><a href="https://sethmlarson.dev/urllib3-is-fundraising-for-http2-support">urllib3 is raising ~$40,000 USD</a> to release HTTP/2 support and ensure long-term sustainable maintenance of the project after a sharp decline in financial support for 2023. If your company or organization uses Python and would benefit from HTTP/2 support in Requests, pip, cloud SDKs, and thousands of other projects <a href="https://opencollective.com/urllib3">please consider contributing financially</a> to ensure HTTP/2 support is developed sustainably and maintained for the long-haul.</p> <p>Thank you for your support.</p> <h2>Changes</h2> <ul> <li>Added the <code>Proxy-Authorization</code> header to the list of headers to strip from requests when redirecting to a different host. As before, different headers can be set via <code>Retry.remove_headers_on_redirect</code>.</li> <li>Allowed passing negative integers as <code>amt</code> to read methods of <code>http.client.HTTPResponse</code> as an alternative to <code>None</code>. (<a href="https://redirect.github.com/urllib3/urllib3/issues/3122">#3122</a>)</li> <li>Fixed return types representing copying actions to use <code>typing.Self</code>. (<a href="https://redirect.github.com/urllib3/urllib3/issues/3363">#3363</a>)</li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/urllib3/urllib3/compare/2.2.1...2.2.2">https://github.com/urllib3/urllib3/compare/2.2.1...2.2.2</a></p> <h2>2.2.1</h2> <h2>🚀 urllib3 is fundraising for HTTP/2 support</h2> <p><a href="https://sethmlarson.dev/urllib3-is-fundraising-for-http2-support">urllib3 is raising ~$40,000 USD</a> to release HTTP/2 support and ensure long-term sustainable maintenance of the project after a sharp decline in financial support for 2023. If your company or organization uses Python and would benefit from HTTP/2 support in Requests, pip, cloud SDKs, and thousands of other projects <a href="https://opencollective.com/urllib3">please consider contributing financially</a> to ensure HTTP/2 support is developed sustainably and maintained for the long-haul.</p> <p>Thank you for your support.</p> <h2>Changes</h2> <ul> <li>Fixed issue where <code>InsecureRequestWarning</code> was emitted for HTTPS connections when using Emscripten. (<a href="https://redirect.github.com/urllib3/urllib3/issues/3331">#3331</a>)</li> <li>Fixed <code>HTTPConnectionPool.urlopen</code> to stop automatically casting non-proxy headers to <code>HTTPHeaderDict</code>. This change was premature as it did not apply to proxy headers and <code>HTTPHeaderDict</code> does not handle byte header values correctly yet. (<a href="https://redirect.github.com/urllib3/urllib3/issues/3343">#3343</a>)</li> <li>Changed <code>ProtocolError</code> to <code>InvalidChunkLength</code> when response terminates before the chunk length is sent. (<a href="https://redirect.github.com/urllib3/urllib3/issues/2860">#2860</a>)</li> <li>Changed <code>ProtocolError</code> to be more verbose on incomplete reads with excess content. (<a href="https://redirect.github.com/urllib3/urllib3/issues/3261">#3261</a>)</li> </ul> <h2>2.2.0</h2> <h2>🖥️ urllib3 now works in the browser</h2> <p>:tada: <strong>This release adds experimental support for <a href="https://urllib3.readthedocs.io/en/stable/reference/contrib/emscripten.html">using urllib3 in the browser with Pyodide</a>!</strong> 🎉</p> <p>Thanks to Joe Marshall (<a href="https://github.com/joemarshall"><code>@joemarshall</code></a>) for contributing this feature. This change was possible thanks to work done in urllib3 v2.0 to detach our API from <code>http.client</code>. Please report all bugs to the <a href="https://github.com/urllib3/urllib3/issues">urllib3 issue tracker</a>.</p> <h2>🚀 urllib3 is fundraising for HTTP/2 support</h2> <p><a href="https://sethmlarson.dev/urllib3-is-fundraising-for-http2-support">urllib3 is raising ~$40,000 USD</a> to release HTTP/2 support and ensure long-term sustainable maintenance of the project after a sharp decline in financial support for 2023. If your company or organization uses Python and would benefit from HTTP/2 support in Requests, pip, cloud SDKs, and thousands of other projects <a href="https://opencollective.com/urllib3">please consider contributing financially</a> to ensure HTTP/2 support is developed sustainably and maintained for the long-haul.</p> <p>Thank you for your support.</p> <h2>Changes</h2> <ul> <li>Added support for <a href="https://urllib3.readthedocs.io/en/latest/reference/contrib/emscripten.html">Emscripten and Pyodide</a>, including streaming support in cross-origin isolated browser environments where threading is enabled. (<a href="https://redirect.github.com/urllib3/urllib3/issues/2951">#2951</a>)</li> <li>Added support for <code>HTTPResponse.read1()</code> method. (<a href="https://redirect.github.com/urllib3/urllib3/issues/3186">#3186</a>)</li> <li>Added rudimentary support for HTTP/2. (<a href="https://redirect.github.com/urllib3/urllib3/issues/3284">#3284</a>)</li> <li>Fixed issue where requests against urls with trailing dots were failing due to SSL errors when using proxy. (<a href="https://redirect.github.com/urllib3/urllib3/issues/2244">#2244</a>)</li> <li>Fixed <code>HTTPConnection.proxy_is_verified</code> and <code>HTTPSConnection.proxy_is_verified</code> to be always set to a boolean after connecting to a proxy. It could be <code>None</code> in some cases previously. (<a href="https://redirect.github.com/urllib3/urllib3/issues/3130">#3130</a>)</li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/urllib3/urllib3/blob/main/CHANGES.rst">urllib3's changelog</a>.</em></p> <blockquote> <h1>2.2.2 (2024-06-17)</h1> <ul> <li>Added the <code>Proxy-Authorization</code> header to the list of headers to strip from requests when redirecting to a different host. As before, different headers can be set via <code>Retry.remove_headers_on_redirect</code>.</li> <li>Allowed passing negative integers as <code>amt</code> to read methods of <code>http.client.HTTPResponse</code> as an alternative to <code>None</code>. (<code>[#3122](urllib3/urllib3#3122) <https://github.com/urllib3/urllib3/issues/3122></code>__)</li> <li>Fixed return types representing copying actions to use <code>typing.Self</code>. (<code>[#3363](urllib3/urllib3#3363) <https://github.com/urllib3/urllib3/issues/3363></code>__)</li> </ul> <h1>2.2.1 (2024-02-16)</h1> <ul> <li>Fixed issue where <code>InsecureRequestWarning</code> was emitted for HTTPS connections when using Emscripten. (<code>[#3331](urllib3/urllib3#3331) <https://github.com/urllib3/urllib3/issues/3331></code>__)</li> <li>Fixed <code>HTTPConnectionPool.urlopen</code> to stop automatically casting non-proxy headers to <code>HTTPHeaderDict</code>. This change was premature as it did not apply to proxy headers and <code>HTTPHeaderDict</code> does not handle byte header values correctly yet. (<code>[#3343](urllib3/urllib3#3343) <https://github.com/urllib3/urllib3/issues/3343></code>__)</li> <li>Changed <code>InvalidChunkLength</code> to <code>ProtocolError</code> when response terminates before the chunk length is sent. (<code>[#2860](urllib3/urllib3#2860) <https://github.com/urllib3/urllib3/issues/2860></code>__)</li> <li>Changed <code>ProtocolError</code> to be more verbose on incomplete reads with excess content. (<code>[#3261](urllib3/urllib3#3261) <https://github.com/urllib3/urllib3/issues/3261></code>__)</li> </ul> <h1>2.2.0 (2024-01-30)</h1> <ul> <li>Added support for <code>Emscripten and Pyodide <https://urllib3.readthedocs.io/en/latest/reference/contrib/emscripten.html></code><strong>, including streaming support in cross-origin isolated browser environments where threading is enabled. (<code>[#2951](urllib3/urllib3#2951) <https://github.com/urllib3/urllib3/issues/2951></code></strong>)</li> <li>Added support for <code>HTTPResponse.read1()</code> method. (<code>[#3186](urllib3/urllib3#3186) <https://github.com/urllib3/urllib3/issues/3186></code>__)</li> <li>Added rudimentary support for HTTP/2. (<code>[#3284](urllib3/urllib3#3284) <https://github.com/urllib3/urllib3/issues/3284></code>__)</li> <li>Fixed issue where requests against urls with trailing dots were failing due to SSL errors when using proxy. (<code>[#2244](urllib3/urllib3#2244) <https://github.com/urllib3/urllib3/issues/2244></code>__)</li> <li>Fixed <code>HTTPConnection.proxy_is_verified</code> and <code>HTTPSConnection.proxy_is_verified</code> to be always set to a boolean after connecting to a proxy. It could be <code>None</code> in some cases previously. (<code>[#3130](urllib3/urllib3#3130) <https://github.com/urllib3/urllib3/issues/3130></code>__)</li> <li>Fixed an issue where <code>headers</code> passed in a request with <code>json=</code> would be mutated (<code>[#3203](urllib3/urllib3#3203) <https://github.com/urllib3/urllib3/issues/3203></code>__)</li> <li>Fixed <code>HTTPSConnection.is_verified</code> to be set to <code>False</code> when connecting from a HTTPS proxy to an HTTP target. It was set to <code>True</code> previously. (<code>[#3267](urllib3/urllib3#3267) <https://github.com/urllib3/urllib3/issues/3267></code>__)</li> <li>Fixed handling of new error message from OpenSSL 3.2.0 when configuring an HTTP proxy as HTTPS (<code>[#3268](urllib3/urllib3#3268) <https://github.com/urllib3/urllib3/issues/3268></code>__)</li> <li>Fixed TLS 1.3 post-handshake auth when the server certificate validation is disabled (<code>[#3325](urllib3/urllib3#3325) <https://github.com/urllib3/urllib3/issues/3325></code>__)</li> <li>Note for downstream distributors: To run integration tests, you now need to run the tests a second time with the <code>--integration</code> pytest flag. (<code>[#3181](urllib3/urllib3#3181) <https://github.com/urllib3/urllib3/issues/3181></code>__)</li> </ul> <h1>2.1.0 (2023-11-13)</h1> <ul> <li>Removed support for the deprecated urllib3[secure] extra. (<code>[#2680](urllib3/urllib3#2680) <https://github.com/urllib3/urllib3/issues/2680></code>__)</li> <li>Removed support for the deprecated SecureTransport TLS implementation. (<code>[#2681](urllib3/urllib3#2681) <https://github.com/urllib3/urllib3/issues/2681></code>__)</li> <li>Removed support for the end-of-life Python 3.7. (<code>[#3143](urllib3/urllib3#3143) <https://github.com/urllib3/urllib3/issues/3143></code>__)</li> <li>Allowed loading CA certificates from memory for proxies. (<code>[#3065](urllib3/urllib3#3065) <https://github.com/urllib3/urllib3/issues/3065></code>__)</li> <li>Fixed decoding Gzip-encoded responses which specified <code>x-gzip</code> content-encoding. (<code>[#3174](urllib3/urllib3#3174) <https://github.com/urllib3/urllib3/issues/3174></code>__)</li> </ul> <h1>2.0.7 (2023-10-17)</h1> <ul> <li>Made body stripped from HTTP requests changing the request method to GET after HTTP 303 "See Other" redirect responses.</li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/urllib3/urllib3/commit/27e2a5c5a7ab6a517252cc8dcef3ffa6ffb8f61a"><code>27e2a5c</code></a> Release 2.2.2 (<a href="https://redirect.github.com/urllib3/urllib3/issues/3406">#3406</a>)</li> <li><a href="https://github.com/urllib3/urllib3/commit/accff72ecc2f6cf5a76d9570198a93ac7c90270e"><code>accff72</code></a> Merge pull request from GHSA-34jh-p97f-mpxf</li> <li><a href="https://github.com/urllib3/urllib3/commit/34be4a57e59eb7365bcc37d52e9f8271b5b8d0d3"><code>34be4a5</code></a> Pin CFFI to a new release candidate instead of a Git commit (<a href="https://redirect.github.com/urllib3/urllib3/issues/3398">#3398</a>)</li> <li><a href="https://github.com/urllib3/urllib3/commit/da410581b6b3df73da976b5ce5eb20a4bd030437"><code>da41058</code></a> Bump browser-actions/setup-chrome from 1.6.0 to 1.7.1 (<a href="https://redirect.github.com/urllib3/urllib3/issues/3399">#3399</a>)</li> <li><a href="https://github.com/urllib3/urllib3/commit/b07a669bd970d69847801148286b726f0570b625"><code>b07a669</code></a> Bump github/codeql-action from 2.13.4 to 3.25.6 (<a href="https://redirect.github.com/urllib3/urllib3/issues/3396">#3396</a>)</li> <li><a href="https://github.com/urllib3/urllib3/commit/b8589ec9f8c4da91511e601b632ac06af7e7c10e"><code>b8589ec</code></a> Measure coverage with v4 of artifact actions (<a href="https://redirect.github.com/urllib3/urllib3/issues/3394">#3394</a>)</li> <li><a href="https://github.com/urllib3/urllib3/commit/f3bdc5585111429e22c81b5fb26c3ec164d98b81"><code>f3bdc55</code></a> Allow triggering CI manually (<a href="https://redirect.github.com/urllib3/urllib3/issues/3391">#3391</a>)</li> <li><a href="https://github.com/urllib3/urllib3/commit/52392654b30183129cf3ec06010306f517d9c146"><code>5239265</code></a> Fix HTTP version in debug log (<a href="https://redirect.github.com/urllib3/urllib3/issues/3316">#3316</a>)</li> <li><a href="https://github.com/urllib3/urllib3/commit/b34619f94ece0c40e691a5aaf1304953d88089de"><code>b34619f</code></a> Bump actions/checkout to 4.1.4 (<a href="https://redirect.github.com/urllib3/urllib3/issues/3387">#3387</a>)</li> <li><a href="https://github.com/urllib3/urllib3/commit/9961d14de7c920091d42d42ed76d5d479b80064d"><code>9961d14</code></a> Bump browser-actions/setup-chrome from 1.5.0 to 1.6.0 (<a href="https://redirect.github.com/urllib3/urllib3/issues/3386">#3386</a>)</li> <li>Additional commits viewable in <a href="https://github.com/urllib3/urllib3/compare/1.26.12...2.2.2">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=urllib3&package-manager=pip&previous-version=1.26.12&new-version=2.2.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) You can trigger a rebase of this PR by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/getsentry/snuba/network/alerts). </details> > **Note** > Automatic rebases have been disabled on this pull request as it has been open for over 30 days. Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Onkar Deshpande <onkar.deshpande@sentry.io>
Retrying this PR: #3236
This PR also contains the fix which caused consumers to go into CrashLoopBackOff when deployed. This PR contains a change to remove the list of all topics in the settings
validation.py
. Instead of doing validation there, it was proposed that we do validation after storages were loaded. This allowed us to removevalidation.py
and validate against the Topics class (after all topics were loaded from configuration). However, it was overlooked that there were 4 topics in thevalidation.py
that were not present in the Topics class.Namely:
This issue went unnoticed in unit testing because we override session topics in
prod_settings
. This PR also includes the fix which adds the 4 topics intotopics.py
.