Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug Report: missing caller-id flag on some vtctldclient commands #17138

Open
notfelineit opened this issue Nov 5, 2024 · 4 comments
Open

Bug Report: missing caller-id flag on some vtctldclient commands #17138

notfelineit opened this issue Nov 5, 2024 · 4 comments

Comments

@notfelineit
Copy link
Contributor

notfelineit commented Nov 5, 2024

Overview of the Issue

Using vtctldclient OnlineDDL without the --caller-id flag results in:

> vtctldclient --server :15999 OnlineDDL cancel <KEYSPACE> <UUID>
E1029 17:24:39.288500      65 main.go:56] rpc error: code = Unknown desc = schema change failed, ExecuteResult: {
  "FailedShards": [
    {
      "Shard": "-",
      "Err": "rpc error: code = Unknown desc = TabletManager.ExecuteQuery on <CELL> error: rpc error: code = Unauthenticated desc = missing caller id: rpc error: code = Unauthenticated desc = missing caller id"
    }
  ],
  "SuccessShards": [],
  "CurSQLIndex": 0,
  "Sqls": [
    "alter vitess_migration <UUID> cancel"
  ],
  "UUIDs": null,
  "ExecutorErr": "",
  "TotalTimeSpent": 24522555
....
}

This works however!

vtctldclient --server localhost:15999 ApplySchema --caller-id <CALLER-ID> --ddl-strategy "online" --sql "alter vitess_migration '<UUID>' cancel" <KEYSPACE>

Reproduction Steps

  1. Start an OnlineDDL vitess migration of any kind
  2. Try to cancel the vitess migration with just vtctldclient --server :15999 OnlineDDL cancel <KEYSPACE> <UUID>
  3. Should error

Binary Version

v18

Operating System and Environment details

> cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.6 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.6 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
> uname -sr
Linux 5.15.0-1071-aws
> uname -m
x86_64


### Log Fragments

_No response_
@notfelineit notfelineit added Type: Bug Needs Triage This issue needs to be correctly labelled and triaged go Component: vtctldclient labels Nov 5, 2024
@mattlord mattlord added Component: Online DDL Online DDL (vitess/native/gh-ost/pt-osc) Component: Query Serving and removed Needs Triage This issue needs to be correctly labelled and triaged labels Nov 5, 2024
@mattlord
Copy link
Contributor

mattlord commented Nov 5, 2024

Thanks, @notfelineit ! The ApplySchema command has a --caller-id flag and it sounds like we might need to make that a top level command flag so that it can be used when needed for any sub-command when strict table ACLs are used. That or at least add it to other sub-commands such as OnlineDDL that are affected by strict table ACLs.

@mattlord
Copy link
Contributor

mattlord commented Nov 5, 2024

@shlomi-noach I think this may be OnlineDDL specific as the command ends up using TabletManager.ExecuteQuery directly from vtctld:

go/vt/schemamanager/tablet_executor.go:		result, err = exec.tmc.ExecuteQuery(ctx, tablet, &tabletmanagerdatapb.ExecuteQueryRequest{

So that's the exact same path that ApplySchema uses and thus the strict table ACLs come into play here as well. I don't see anything else that uses this tabletmanager RPC in this way (other than ApplySchema and OnlineDDL via that).

@mattlord
Copy link
Contributor

mattlord commented Nov 5, 2024

AFAICT, this is what we need to do:

diff --git a/go/cmd/vtctldclient/command/onlineddl.go b/go/cmd/vtctldclient/command/onlineddl.go
index cec60ddd98..eb18be93f7 100644
--- a/go/cmd/vtctldclient/command/onlineddl.go
+++ b/go/cmd/vtctldclient/command/onlineddl.go
@@ -153,6 +153,7 @@ func commandOnlineDDLCancel(cmd *cobra.Command, args []string) error {
        resp, err := client.CancelSchemaMigration(commandCtx, &vtctldatapb.CancelSchemaMigrationRequest{
                Keyspace: keyspace,
                Uuid:     uuid,
+               CallerId: applySchemaOptions.CallerID,
        })
        if err != nil {
                return err
@@ -417,6 +418,8 @@ func commandOnlineDDLShow(cmd *cobra.Command, args []string) error {
 }

 func init() {
+       OnlineDDL.Flags().StringVar(&applySchemaOptions.CallerID, "caller-id", "", "Effective caller ID used for the operation and should map to an ACL name which grants this identity the necessary permissions to perform the operation (this is only necessary when strict table ACLs are used).")
+
        OnlineDDL.AddCommand(OnlineDDLCancel)
        OnlineDDL.AddCommand(OnlineDDLCleanup)
        OnlineDDL.AddCommand(OnlineDDLComplete)
diff --git a/go/vt/vtctl/grpcvtctldserver/server.go b/go/vt/vtctl/grpcvtctldserver/server.go
index aaf13fb864..65c8d8ceef 100644
--- a/go/vt/vtctl/grpcvtctldserver/server.go
+++ b/go/vt/vtctl/grpcvtctldserver/server.go
@@ -575,6 +575,7 @@ func (s *VtctldServer) CancelSchemaMigration(ctx context.Context, req *vtctldata
                Keyspace:            req.Keyspace,
                Sql:                 []string{query},
                WaitReplicasTimeout: protoutil.DurationToProto(DefaultWaitReplicasTimeout),
+               CallerId:            req.CallerId,
        })
        if err != nil {
                return nil, err

Adding caller_id to all of the request proto types so that it gets passed on to the ApplySchema path such as for Cancel here:

// CancelSchemaMigration is part of the vtctlservicepb.VtctldServer interface.
func (s *VtctldServer) CancelSchemaMigration(ctx context.Context, req *vtctldatapb.CancelSchemaMigrationRequest) (resp *vtctldatapb.CancelSchemaMigrationResponse, err error) {
span, ctx := trace.NewSpan(ctx, "VtctldServer.CancelSchemaMigration")
defer span.Finish()
defer panicHandler(&err)
span.Annotate("keyspace", req.Keyspace)
span.Annotate("uuid", req.Uuid)
query, err := alterSchemaMigrationQuery("cancel", req.Uuid)
if err != nil {
return nil, err
}
log.Info("Calling ApplySchema to cancel migration")
qr, err := s.ApplySchema(ctx, &vtctldatapb.ApplySchemaRequest{
Keyspace: req.Keyspace,
Sql: []string{query},
WaitReplicasTimeout: protoutil.DurationToProto(DefaultWaitReplicasTimeout),
})
if err != nil {
return nil, err
}
resp = &vtctldatapb.CancelSchemaMigrationResponse{
RowsAffectedByShard: qr.RowsAffectedByShard,
}
return resp, nil
}

@shlomi-noach
Copy link
Contributor

Thank you @notfelineit for detecting and @mattlord for analyzing. You're both correct here, and we should make that change.

Side note: OnlineDDL command was meant as a handy, cosmetic and syntactic sugar coating on top of otherwise more complex syntax. But it is superfluous and just adds one more layer to always support. I regret adding it, but I suspect it caught high traction. This required change here (thank you) just makes it even less attractive to maintain & use.

@shlomi-noach shlomi-noach self-assigned this Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants