Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ref(migrations): Add copy_tables command #3174

Closed
wants to merge 2 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 66 additions & 0 deletions snuba/migrations/runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -434,3 +434,69 @@ def add_node(
elif isinstance(migration, CodeMigration):
for python_op in migration.forwards_global():
python_op.execute_new_node(storage_sets)

@classmethod
def copy_tables(
self,
node_type: ClickhouseNodeType,
host_name: str,
port: int,
user: str,
password: str,
database: str,
new_port: int,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this for ?

table_names: Optional[str],
dry_run: bool = False,
) -> None:
"""
Create tables on a new node in a cluster by using the existing create table
statement from an existing node in a cluster. This can be done for a new shard
or new replica.
"""
client_settings = ClickhouseClientSettings.QUERY.value
clickhouse = ClickhousePool(
host_name,
port,
user,
password,
database,
client_settings=client_settings.settings,
send_receive_timeout=client_settings.timeout,
)

# TODO: assert that the new node is not in the system.clusters definitions yet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we discuss this before? Not sure why we want to enforce this.
Did you discuss this process with @nikhars and @mwarkentin ?
We want to make the addition of a new node quick, that means having as few manual steps as possible. adding the tables before adding the node to the cluster implies at least three steps: provision node, add tables, then add the node to the cluster.

I think we should ensure we are on the same page on the quickest process to add a node.


if table_names:
ch_table_names = table_names.split(",")
else:
ch_table_names = [
result[0] for result in clickhouse.execute("SHOW TABLES").results
]
Comment on lines +472 to +474
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is going to be a bit fragile. If we trust the content of the tables on another node we should at least validate they are the same you have in the storage set you are trying to deploy on this node.
More in general, how do you expect this script to be used? Is the user going to tell "please duplicate this node" or is the user going to tell "Please create all the tables you should have on this node according to the storage sets on this cluster" ?


show_table_statments = {}

for name in ch_table_names:
((curr_create_table_statement,),) = clickhouse.execute(
f"SHOW CREATE TABLE {database}.{name}"
).results
show_table_statments[name] = curr_create_table_statement

new_client_settings = ClickhouseClientSettings.MIGRATE.value
new_clickhouse_node = ClickhousePool(
host_name,
new_port,
Comment on lines +486 to +487
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we assuming the host name is going to be the same? In most cases this is not going to be true.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

happened to be true for testing, but yes I should update to have a new_host_name param as well

user,
password,
database,
client_settings=new_client_settings.settings,
send_receive_timeout=new_client_settings.timeout,
)

# TODO: print out the system.macros for the new node

for table_name, statement in show_table_statments.items():
print(f"creating {table_name}...")
if dry_run:
print(f"create table statement: \n {statement}")
return
new_clickhouse_node.execute(statement)