-
-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ref(migrations): Add copy_tables command #3174
Conversation
snuba/migrations/runner.py
Outdated
@@ -418,3 +418,69 @@ def add_node( | |||
elif isinstance(migration, CodeMigration): | |||
for python_op in migration.forwards_global(): | |||
python_op.execute_new_node(storage_sets) | |||
|
|||
@classmethod | |||
def create_tables( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we call this copy_tables? To ensure there is no confusion about when to use and to keep it more distinct from the existing create table operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I originally had copy_and_create_tables
but that felt too long. copy_tables
i think is better
cf02f22
to
3b278dd
Compare
Codecov ReportBase: 92.75% // Head: 92.70% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## master #3174 +/- ##
==========================================
- Coverage 92.75% 92.70% -0.05%
==========================================
Files 693 693
Lines 31396 31415 +19
==========================================
+ Hits 29121 29123 +2
- Misses 2275 2292 +17
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
user: str, | ||
password: str, | ||
database: str, | ||
new_port: int, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this for ?
send_receive_timeout=client_settings.timeout, | ||
) | ||
|
||
# TODO: assert that the new node is not in the system.clusters definitions yet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we discuss this before? Not sure why we want to enforce this.
Did you discuss this process with @nikhars and @mwarkentin ?
We want to make the addition of a new node quick, that means having as few manual steps as possible. adding the tables before adding the node to the cluster implies at least three steps: provision node, add tables, then add the node to the cluster.
I think we should ensure we are on the same page on the quickest process to add a node.
ch_table_names = [ | ||
result[0] for result in clickhouse.execute("SHOW TABLES").results | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is going to be a bit fragile. If we trust the content of the tables on another node we should at least validate they are the same you have in the storage set you are trying to deploy on this node.
More in general, how do you expect this script to be used? Is the user going to tell "please duplicate this node" or is the user going to tell "Please create all the tables you should have on this node according to the storage sets on this cluster" ?
host_name, | ||
new_port, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we assuming the host name is going to be the same? In most cases this is not going to be true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
happened to be true for testing, but yes I should update to have a new_host_name
param as well
Closing in favor of #3241, we may need to come back and add some cli command utility but we can do that in a new PR when the time comes |
Bumps [urllib3](https://github.com/urllib3/urllib3) from 1.26.12 to 2.2.2. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/urllib3/urllib3/releases">urllib3's releases</a>.</em></p> <blockquote> <h2>2.2.2</h2> <h2>🚀 urllib3 is fundraising for HTTP/2 support</h2> <p><a href="https://sethmlarson.dev/urllib3-is-fundraising-for-http2-support">urllib3 is raising ~$40,000 USD</a> to release HTTP/2 support and ensure long-term sustainable maintenance of the project after a sharp decline in financial support for 2023. If your company or organization uses Python and would benefit from HTTP/2 support in Requests, pip, cloud SDKs, and thousands of other projects <a href="https://opencollective.com/urllib3">please consider contributing financially</a> to ensure HTTP/2 support is developed sustainably and maintained for the long-haul.</p> <p>Thank you for your support.</p> <h2>Changes</h2> <ul> <li>Added the <code>Proxy-Authorization</code> header to the list of headers to strip from requests when redirecting to a different host. As before, different headers can be set via <code>Retry.remove_headers_on_redirect</code>.</li> <li>Allowed passing negative integers as <code>amt</code> to read methods of <code>http.client.HTTPResponse</code> as an alternative to <code>None</code>. (<a href="https://redirect.github.com/urllib3/urllib3/issues/3122">#3122</a>)</li> <li>Fixed return types representing copying actions to use <code>typing.Self</code>. (<a href="https://redirect.github.com/urllib3/urllib3/issues/3363">#3363</a>)</li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/urllib3/urllib3/compare/2.2.1...2.2.2">https://github.com/urllib3/urllib3/compare/2.2.1...2.2.2</a></p> <h2>2.2.1</h2> <h2>🚀 urllib3 is fundraising for HTTP/2 support</h2> <p><a href="https://sethmlarson.dev/urllib3-is-fundraising-for-http2-support">urllib3 is raising ~$40,000 USD</a> to release HTTP/2 support and ensure long-term sustainable maintenance of the project after a sharp decline in financial support for 2023. If your company or organization uses Python and would benefit from HTTP/2 support in Requests, pip, cloud SDKs, and thousands of other projects <a href="https://opencollective.com/urllib3">please consider contributing financially</a> to ensure HTTP/2 support is developed sustainably and maintained for the long-haul.</p> <p>Thank you for your support.</p> <h2>Changes</h2> <ul> <li>Fixed issue where <code>InsecureRequestWarning</code> was emitted for HTTPS connections when using Emscripten. (<a href="https://redirect.github.com/urllib3/urllib3/issues/3331">#3331</a>)</li> <li>Fixed <code>HTTPConnectionPool.urlopen</code> to stop automatically casting non-proxy headers to <code>HTTPHeaderDict</code>. This change was premature as it did not apply to proxy headers and <code>HTTPHeaderDict</code> does not handle byte header values correctly yet. (<a href="https://redirect.github.com/urllib3/urllib3/issues/3343">#3343</a>)</li> <li>Changed <code>ProtocolError</code> to <code>InvalidChunkLength</code> when response terminates before the chunk length is sent. (<a href="https://redirect.github.com/urllib3/urllib3/issues/2860">#2860</a>)</li> <li>Changed <code>ProtocolError</code> to be more verbose on incomplete reads with excess content. (<a href="https://redirect.github.com/urllib3/urllib3/issues/3261">#3261</a>)</li> </ul> <h2>2.2.0</h2> <h2>🖥️ urllib3 now works in the browser</h2> <p>:tada: <strong>This release adds experimental support for <a href="https://urllib3.readthedocs.io/en/stable/reference/contrib/emscripten.html">using urllib3 in the browser with Pyodide</a>!</strong> 🎉</p> <p>Thanks to Joe Marshall (<a href="https://github.com/joemarshall"><code>@joemarshall</code></a>) for contributing this feature. This change was possible thanks to work done in urllib3 v2.0 to detach our API from <code>http.client</code>. Please report all bugs to the <a href="https://github.com/urllib3/urllib3/issues">urllib3 issue tracker</a>.</p> <h2>🚀 urllib3 is fundraising for HTTP/2 support</h2> <p><a href="https://sethmlarson.dev/urllib3-is-fundraising-for-http2-support">urllib3 is raising ~$40,000 USD</a> to release HTTP/2 support and ensure long-term sustainable maintenance of the project after a sharp decline in financial support for 2023. If your company or organization uses Python and would benefit from HTTP/2 support in Requests, pip, cloud SDKs, and thousands of other projects <a href="https://opencollective.com/urllib3">please consider contributing financially</a> to ensure HTTP/2 support is developed sustainably and maintained for the long-haul.</p> <p>Thank you for your support.</p> <h2>Changes</h2> <ul> <li>Added support for <a href="https://urllib3.readthedocs.io/en/latest/reference/contrib/emscripten.html">Emscripten and Pyodide</a>, including streaming support in cross-origin isolated browser environments where threading is enabled. (<a href="https://redirect.github.com/urllib3/urllib3/issues/2951">#2951</a>)</li> <li>Added support for <code>HTTPResponse.read1()</code> method. (<a href="https://redirect.github.com/urllib3/urllib3/issues/3186">#3186</a>)</li> <li>Added rudimentary support for HTTP/2. (<a href="https://redirect.github.com/urllib3/urllib3/issues/3284">#3284</a>)</li> <li>Fixed issue where requests against urls with trailing dots were failing due to SSL errors when using proxy. (<a href="https://redirect.github.com/urllib3/urllib3/issues/2244">#2244</a>)</li> <li>Fixed <code>HTTPConnection.proxy_is_verified</code> and <code>HTTPSConnection.proxy_is_verified</code> to be always set to a boolean after connecting to a proxy. It could be <code>None</code> in some cases previously. (<a href="https://redirect.github.com/urllib3/urllib3/issues/3130">#3130</a>)</li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/urllib3/urllib3/blob/main/CHANGES.rst">urllib3's changelog</a>.</em></p> <blockquote> <h1>2.2.2 (2024-06-17)</h1> <ul> <li>Added the <code>Proxy-Authorization</code> header to the list of headers to strip from requests when redirecting to a different host. As before, different headers can be set via <code>Retry.remove_headers_on_redirect</code>.</li> <li>Allowed passing negative integers as <code>amt</code> to read methods of <code>http.client.HTTPResponse</code> as an alternative to <code>None</code>. (<code>[#3122](urllib3/urllib3#3122) <https://github.com/urllib3/urllib3/issues/3122></code>__)</li> <li>Fixed return types representing copying actions to use <code>typing.Self</code>. (<code>[#3363](urllib3/urllib3#3363) <https://github.com/urllib3/urllib3/issues/3363></code>__)</li> </ul> <h1>2.2.1 (2024-02-16)</h1> <ul> <li>Fixed issue where <code>InsecureRequestWarning</code> was emitted for HTTPS connections when using Emscripten. (<code>[#3331](urllib3/urllib3#3331) <https://github.com/urllib3/urllib3/issues/3331></code>__)</li> <li>Fixed <code>HTTPConnectionPool.urlopen</code> to stop automatically casting non-proxy headers to <code>HTTPHeaderDict</code>. This change was premature as it did not apply to proxy headers and <code>HTTPHeaderDict</code> does not handle byte header values correctly yet. (<code>[#3343](urllib3/urllib3#3343) <https://github.com/urllib3/urllib3/issues/3343></code>__)</li> <li>Changed <code>InvalidChunkLength</code> to <code>ProtocolError</code> when response terminates before the chunk length is sent. (<code>[#2860](urllib3/urllib3#2860) <https://github.com/urllib3/urllib3/issues/2860></code>__)</li> <li>Changed <code>ProtocolError</code> to be more verbose on incomplete reads with excess content. (<code>[#3261](urllib3/urllib3#3261) <https://github.com/urllib3/urllib3/issues/3261></code>__)</li> </ul> <h1>2.2.0 (2024-01-30)</h1> <ul> <li>Added support for <code>Emscripten and Pyodide <https://urllib3.readthedocs.io/en/latest/reference/contrib/emscripten.html></code><strong>, including streaming support in cross-origin isolated browser environments where threading is enabled. (<code>[#2951](urllib3/urllib3#2951) <https://github.com/urllib3/urllib3/issues/2951></code></strong>)</li> <li>Added support for <code>HTTPResponse.read1()</code> method. (<code>[#3186](urllib3/urllib3#3186) <https://github.com/urllib3/urllib3/issues/3186></code>__)</li> <li>Added rudimentary support for HTTP/2. (<code>[#3284](urllib3/urllib3#3284) <https://github.com/urllib3/urllib3/issues/3284></code>__)</li> <li>Fixed issue where requests against urls with trailing dots were failing due to SSL errors when using proxy. (<code>[#2244](urllib3/urllib3#2244) <https://github.com/urllib3/urllib3/issues/2244></code>__)</li> <li>Fixed <code>HTTPConnection.proxy_is_verified</code> and <code>HTTPSConnection.proxy_is_verified</code> to be always set to a boolean after connecting to a proxy. It could be <code>None</code> in some cases previously. (<code>[#3130](urllib3/urllib3#3130) <https://github.com/urllib3/urllib3/issues/3130></code>__)</li> <li>Fixed an issue where <code>headers</code> passed in a request with <code>json=</code> would be mutated (<code>[#3203](urllib3/urllib3#3203) <https://github.com/urllib3/urllib3/issues/3203></code>__)</li> <li>Fixed <code>HTTPSConnection.is_verified</code> to be set to <code>False</code> when connecting from a HTTPS proxy to an HTTP target. It was set to <code>True</code> previously. (<code>[#3267](urllib3/urllib3#3267) <https://github.com/urllib3/urllib3/issues/3267></code>__)</li> <li>Fixed handling of new error message from OpenSSL 3.2.0 when configuring an HTTP proxy as HTTPS (<code>[#3268](urllib3/urllib3#3268) <https://github.com/urllib3/urllib3/issues/3268></code>__)</li> <li>Fixed TLS 1.3 post-handshake auth when the server certificate validation is disabled (<code>[#3325](urllib3/urllib3#3325) <https://github.com/urllib3/urllib3/issues/3325></code>__)</li> <li>Note for downstream distributors: To run integration tests, you now need to run the tests a second time with the <code>--integration</code> pytest flag. (<code>[#3181](urllib3/urllib3#3181) <https://github.com/urllib3/urllib3/issues/3181></code>__)</li> </ul> <h1>2.1.0 (2023-11-13)</h1> <ul> <li>Removed support for the deprecated urllib3[secure] extra. (<code>[#2680](urllib3/urllib3#2680) <https://github.com/urllib3/urllib3/issues/2680></code>__)</li> <li>Removed support for the deprecated SecureTransport TLS implementation. (<code>[#2681](urllib3/urllib3#2681) <https://github.com/urllib3/urllib3/issues/2681></code>__)</li> <li>Removed support for the end-of-life Python 3.7. (<code>[#3143](urllib3/urllib3#3143) <https://github.com/urllib3/urllib3/issues/3143></code>__)</li> <li>Allowed loading CA certificates from memory for proxies. (<code>[#3065](urllib3/urllib3#3065) <https://github.com/urllib3/urllib3/issues/3065></code>__)</li> <li>Fixed decoding Gzip-encoded responses which specified <code>x-gzip</code> content-encoding. (<code>[#3174](urllib3/urllib3#3174) <https://github.com/urllib3/urllib3/issues/3174></code>__)</li> </ul> <h1>2.0.7 (2023-10-17)</h1> <ul> <li>Made body stripped from HTTP requests changing the request method to GET after HTTP 303 "See Other" redirect responses.</li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/urllib3/urllib3/commit/27e2a5c5a7ab6a517252cc8dcef3ffa6ffb8f61a"><code>27e2a5c</code></a> Release 2.2.2 (<a href="https://redirect.github.com/urllib3/urllib3/issues/3406">#3406</a>)</li> <li><a href="https://github.com/urllib3/urllib3/commit/accff72ecc2f6cf5a76d9570198a93ac7c90270e"><code>accff72</code></a> Merge pull request from GHSA-34jh-p97f-mpxf</li> <li><a href="https://github.com/urllib3/urllib3/commit/34be4a57e59eb7365bcc37d52e9f8271b5b8d0d3"><code>34be4a5</code></a> Pin CFFI to a new release candidate instead of a Git commit (<a href="https://redirect.github.com/urllib3/urllib3/issues/3398">#3398</a>)</li> <li><a href="https://github.com/urllib3/urllib3/commit/da410581b6b3df73da976b5ce5eb20a4bd030437"><code>da41058</code></a> Bump browser-actions/setup-chrome from 1.6.0 to 1.7.1 (<a href="https://redirect.github.com/urllib3/urllib3/issues/3399">#3399</a>)</li> <li><a href="https://github.com/urllib3/urllib3/commit/b07a669bd970d69847801148286b726f0570b625"><code>b07a669</code></a> Bump github/codeql-action from 2.13.4 to 3.25.6 (<a href="https://redirect.github.com/urllib3/urllib3/issues/3396">#3396</a>)</li> <li><a href="https://github.com/urllib3/urllib3/commit/b8589ec9f8c4da91511e601b632ac06af7e7c10e"><code>b8589ec</code></a> Measure coverage with v4 of artifact actions (<a href="https://redirect.github.com/urllib3/urllib3/issues/3394">#3394</a>)</li> <li><a href="https://github.com/urllib3/urllib3/commit/f3bdc5585111429e22c81b5fb26c3ec164d98b81"><code>f3bdc55</code></a> Allow triggering CI manually (<a href="https://redirect.github.com/urllib3/urllib3/issues/3391">#3391</a>)</li> <li><a href="https://github.com/urllib3/urllib3/commit/52392654b30183129cf3ec06010306f517d9c146"><code>5239265</code></a> Fix HTTP version in debug log (<a href="https://redirect.github.com/urllib3/urllib3/issues/3316">#3316</a>)</li> <li><a href="https://github.com/urllib3/urllib3/commit/b34619f94ece0c40e691a5aaf1304953d88089de"><code>b34619f</code></a> Bump actions/checkout to 4.1.4 (<a href="https://redirect.github.com/urllib3/urllib3/issues/3387">#3387</a>)</li> <li><a href="https://github.com/urllib3/urllib3/commit/9961d14de7c920091d42d42ed76d5d479b80064d"><code>9961d14</code></a> Bump browser-actions/setup-chrome from 1.5.0 to 1.6.0 (<a href="https://redirect.github.com/urllib3/urllib3/issues/3386">#3386</a>)</li> <li>Additional commits viewable in <a href="https://github.com/urllib3/urllib3/compare/1.26.12...2.2.2">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=urllib3&package-manager=pip&previous-version=1.26.12&new-version=2.2.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) You can trigger a rebase of this PR by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/getsentry/snuba/network/alerts). </details> > **Note** > Automatic rebases have been disabled on this pull request as it has been open for over 30 days. Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Onkar Deshpande <onkar.deshpande@sentry.io>
context:
We have some ClickHouse migrations tooling that currently used to
run
andreverse
migrations in addition tolist
-ing the current status of migrations. This is great for existing datasets where we already have the clusters set up how we want. However, if we want to add a replica or a shard to an existing cluster we cannot easy do so.add_node
today:There is some prior art which is the current add_node command. This basically runs all the migrations in order for a specific storage set on a new node. The end result is hopefully that the node is now in the same state as all the other nodes in the clusters. This works only in the case where the initial
CREATE TABLE
statement doesn't differ from what the current schema is.why it doesn't quite work:
ClickHouse requires the table schema to be same in order to replicate any data. If we have a table that has gone through many schema iterations, the moment we try to run the
0001_add_table...
migration, the table on the new node is in a different state than the other nodes in the cluster. When we try to grab parts from ZooKeeper to replicate data we will get some sort of error.For example, in the case below, the first migration for the
generic_metrics
actually has multiple SQL operations, firstCREATE TABLE
, thenADD COLUMN
(and then some other add columns/add indexes). But right after we run the firstCREATE TABLE
, we get an error about the_indexed_tags_hash
missing, which makes sense because it hasn't been added yet (that was the next SQL statement).solution
We don't really care about running the migrations in order, since as far as the cluster in concerned, the migrations are up to date (its just the new node that's not). Instead we can copy the current table statement, using
SHOW CREATE TABLE
, and the use that to create the tables on the new node. This way the schema should be identical to all the other nodes in the cluster, preventing any errors of missing columns, missing indexes etc.In order to do this we need:
SHOW CREATE TABLE
commandIn this PR I've added the option to either pass in the list of table names, or for the command to do that automatically by running the
SHOW TABLE
command on the node we are copying the table statements from.