Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Exit before_command_run if telemetry consent is passed through kedro run #681

Closed
wants to merge 8 commits into from

Conversation

lrcouto
Copy link
Contributor

@lrcouto lrcouto commented May 17, 2024

Description

Works together with kedro-org/kedro#3876.

Modifies the before_command_run hook in kedro-telemetry to exit if the user has passed the telemetry consent through the flag on kedro run.

Development notes

Checklist

  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the relevant RELEASE.md file
  • Added tests to cover my changes

@lrcouto lrcouto changed the title Exit before_command_run if telemetry consent is passed through kedro run feat: Exit before_command_run if telemetry consent is passed through kedro run May 17, 2024
@lrcouto lrcouto marked this pull request as ready for review May 17, 2024 22:12
@@ -107,6 +107,10 @@ def before_command_run(
if not project_metadata: # in package mode
return

for arg in command_args:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this mean though that using the --telemetry flag is the same as opting out? If we return at this point, no data will be sent even if the user provided "yes" to the flag.

Copy link
Contributor Author

@lrcouto lrcouto May 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagined the person would only use the flag once and then on subsequent runs that would be handled, but you're right. I'm going to look over this one again.

Signed-off-by: lrcouto <laurarccouto@gmail.com>
Signed-off-by: lrcouto <laurarccouto@gmail.com>
@lrcouto
Copy link
Contributor Author

lrcouto commented May 21, 2024

Changed the code to bypass the _check_for_telemetry_consent function if the --telemetry flag is present, instead picking up the consent response directly from the CLI flag. The only issue I personally see with this approach is that we validate the response twice, once here in the hook, and once in the run function itself.

@DimedS
Copy link
Contributor

DimedS commented May 22, 2024

Why, then, did we create other PRs for the Kedro CLI? Perhaps that task can be handled within that hook for all commands?

@lrcouto
Copy link
Contributor Author

lrcouto commented May 22, 2024

Why, then, did we create other PRs for the Kedro CLI? Perhaps that task can be handled within that hook for all commands?

We'd still need at the very least to have a PR that introduces the "--telemetry" flag to the run function to people to use in the CLI, so it needs to be changed regardless.

@DimedS
Copy link
Contributor

DimedS commented May 22, 2024

Why, then, did we create other PRs for the Kedro CLI? Perhaps that task can be handled within that hook for all commands?

We'd still need at the very least to have a PR that introduces the "--telemetry" flag to the run function to people to use in the CLI, so it needs to be changed regardless.

In that case, I believe it would be better to put all the logic for validating the answer and writing to the file into that hook (since it was already there previously). We can then add just a few lines to the Kedro CLI to initialise that command. This approach avoids duplication and ensures the same logic is used for all commands. Do you know why that hook doesn't work on kedro new command?

@lrcouto
Copy link
Contributor Author

lrcouto commented May 22, 2024

Why, then, did we create other PRs for the Kedro CLI? Perhaps that task can be handled within that hook for all commands?

We'd still need at the very least to have a PR that introduces the "--telemetry" flag to the run function to people to use in the CLI, so it needs to be changed regardless.

In that case, I believe it would be better to put all the logic for validating the answer and writing to the file into that hook (since it was already there previously). We can then add just a few lines to the Kedro CLI to initialise that command. This approach avoids duplication and ensures the same logic is used for all commands. Do you know why that hook doesn't work on kedro new command?

The hook does get called on the "new" command, but the prompt to consent is called the first time a Kedro CLI command is used inside the project directory. You can test this by creating a new project, using a command like "kedro --version" outside the project directory, and then inside. The consent prompt will be shown when you do it inside the project directory. Because of that, if you set the .telemetry file to already have a response on project creation, it'll skip the prompt.

We could do all of the validation and writing for the kedro run --telemetry case on the hook itself so we don't have to do it twice, I think that would work. I'm gonna give it a try.

@ankatiyar
Copy link
Contributor

It seems like in kedro-org/kedro#3876, for a kedro run, the .telemetry file is updated according to the user response. So if a user does kedro run --telemetry=no then for all subsequent runs the telemetry will not be collected unless kedro run --telemetry=yes is run again. I think manipulating .telemetry file for kedro new makes sense but not for kedro run

In my head, if kedro run --telemetry=no is used, telemetry should not be collected for that particular run? Basically Kedro should pass along the command to kedro-telemetry and if we encounter --telemetry=no we should exit without sending anything to heap.

It seems strange that one run command decides the telemetry collection for all subsequent commands. We could add --telemetry flag to all subcommands and have the logic in kedro-telemetry to not send data if --telemetry flag exists in the command itself. So this PR seems fine to me, but I think the logic in kedro-org/kedro#3876 should simply be adding the flag. Otherwise, if we keep kedro-org/kedro#3876 the way it is, it will set the .telemetry consent to false so then we don't need to be doing any checking for the --telemetry flag because that is unnecessary. What do you think @lrcouto @merelcht @DimedS

@DimedS
Copy link
Contributor

DimedS commented May 23, 2024

What do you think @lrcouto @merelcht @DimedS

I agree with your point, but for me, it's more about naming. We can change it to set_telemetry=on/off and keep the concept as it is. I believe it's useful to provide this option to users. Additionally, I think the core issue is that we need to run the hook before the kedro-new command. This way, we can ask users if they want to participate in telemetry when they are creating a project, rather than during the first operation in the project. This approach avoids potential confusion, especially if that first operation occurs in a production environment.

@lrcouto
Copy link
Contributor Author

lrcouto commented May 23, 2024

What do you think @lrcouto @merelcht @DimedS

In a vacuum, a do agree with this notion that you should be able to use the command to decide if telemetry will be collected on a run by run basis, but the idea of this specific task was to introduce a CLI option that reproduces the behavior of the prompt the user gets when running the CLI on a project for the first time, and that writes on the .telemetry file. So the idea would be using the flag only if you do want to write on the file and change consent. This is also the behavior you see on the flag in the new function.

If we were to do that, I'd change the name of the flag so it doesn't become confusing, since we have another one with the same name that does change the .telemetry file.

Changing the prompt to consent or not to telemetry to project creation instead of on the first CLI run would need to change when the plugin first becomes available, and it's a bit out of the scope of this specific task. But we could discuss that and maybe open another issue.

@lrcouto
Copy link
Contributor Author

lrcouto commented May 23, 2024

Also, if we do want to rethink how this is going to work, should we revert kedro-org/kedro#3876? With only that one merged and not this one, it will allow the user to set consent through a flag but the prompt will override it.

@ankatiyar
Copy link
Contributor

ankatiyar commented May 23, 2024

I agree with your point, but for me, it's more about naming. We can change it to set_telemetry=on/off and keep the concept as it is. I believe it's useful to provide this option to users.

Yeah, it might be useful but it's a strange thing to do as an option for the kedro run. Could be a separate kedro-telemetry command maybe?

Additionally, I think the core issue is that we need to run the hook before the kedro-new command. This way, we can ask users if they want to participate in telemetry when they are creating a project, rather than during the first operation in the project. This approach avoids potential confusion, especially if that first operation occurs in a production environment.

That I think could be a separate issue. Out of scope for this discussion

For now I think -

  • when a user creates a project with kedro new, telemetry wouldn't prompt them for permission at that point because data is only collected from inside a project. So if you do kedro new --telemetry=no it creates a project with a .telemetry file with consent=false so any subsequent kedro commands for that project (from within the project directory), telemetry will not be collected. This seems fine to me.
  • For when you're running Kedro commands within a project, kedro run or kedro catalog list etc - No telemetry should be collected for that specific command. (This will require adding --telemetry to all project subcommands)
  • If a user wants to turn telemetry for a project off at any point, they can update the .telemetry file themselves. OR maybe we can add a CLI command to kedro-telemetry to turn change consent to false kedro telemetry off or somthing?

@astrojuanlu astrojuanlu self-requested a review May 23, 2024 13:13
@DimedS
Copy link
Contributor

DimedS commented May 23, 2024

I agree with @ankatiyar proposals that setting telemetry on kedro new should apply to the entire project (we should also consider asking users about telemetry at that step, rather than just allowing them to specify --telemetry). I agree that from one side it will be clearer if using the --telemetry option with other commands applies only to that particular run, but from other side - the same flag will behave differently. However, I believe we need to gather more opinions on this matter.

I also concur with @lrcouto that we should finalise the feature design before proceeding with the coding. The PR, which has already been merged, is not currently properly functioning. It seems that we will need the next sprint to agree on a new solution and implement it. Therefore, I support reverting the previous PR and rethinking the approach to avoid blocking the current release.

@noklam
Copy link
Contributor

noklam commented May 23, 2024

Just left a comment here before everyone jump in, I asked @astrojuanlu to clarify the requirements here. I think both solutions are correct but the main question here is what the desired user journey.

@merelcht
Copy link
Member

Thanks all for your thoughts on this. I've looked at the original tickets and comments and I agree that the feature ask actually wasn't clear here. I interpreted it the same way as @lrcouto did, thinking that kedro new --telemetry and kedro run --telemetry should behave the same. The main reason we introduced this feature is for automated processes (test in CI), so I didn't actually think about the setting of telemetry for a specific run. But @ankatiyar brings some very thoughtful points.

I still think that the way @lrcouto built it is my ideal approach, but would be good to get a product view from @astrojuanlu

@lrcouto
Copy link
Contributor Author

lrcouto commented May 23, 2024

PR for reversion here in case we want it: kedro-org/kedro#3888

We could go either way, proceed as it is or revert and go back to the drawing table.

@astrojuanlu
Copy link
Member

astrojuanlu commented May 23, 2024

To recap on how we got here:

The original user problem was described in kedro-org/kedro#1640:

When any kedro command is executed for the first time or in a clean environment (which is often the case in CI/CD) the telemetry prompt gets run. The user then has to answer Yes or No to running telemetry. If no user is involved, e.g. in an automated CI/CD workflow a hack needs to be put in place to programatically add a .telemetry file.

Then in kedro-org/kedro#2867 a solution was proposed, but it fixated in kedro new.

And in the middle of the implementation kedro-org/kedro#3701, we realised that adding that flag to kedro new didn't actually solve the original problem kedro-org/kedro#3701 (review) because it's kedro run the command that is most likely to run on an unattended environment and therefore get stuck.

So in kedro-org/kedro#3701 (comment) we agreed to add "a similar flag" to kedro run.

But indeed, we never fully clarified what said flag was supposed to do.

I do agree with @ankatiyar that it's weird for kedro run --telemetry to set the telemetry consent for the whole project, rather than for the specific run. Essentially what she described in #681 (comment). Responding to a couple of things:

(This will require adding --telemetry to all project subcommands)

I suggested that we take this to kedro-telemetry on kedro-org/kedro#3701 (comment) but it was not clear how feasible it was, plus we wanted to be practical and cover just kedro new and kedro run.

OR maybe we can add a CLI command to kedro-telemetry to turn change consent to false kedro telemetry off or somthing?

This is an idea we hadn't considered yet and maybe worth giving a second thought. At least it decouples kedro commands from kedro-telemetry stuff, which is really neat and clean.

As a user, going to my CI runs and changing kedro run to kedro run --skip-telemetry or doing kedro telemetry off && kedro run looks mostly the same (?).

@noklam
Copy link
Contributor

noklam commented May 23, 2024

As a user, going to my CI runs and changing kedro run to kedro run --skip-telemetry or doing kedro telemetry off && kedro run looks mostly the same (?).

Conceptually this de-coupling is cleaner, but I don't know if it's really useful. This would work equally well:

pip uninstall kedro-telemetry && kedro run

At the end, the only one cares about telemetry is us, users certainly won't switch telemetry on if it's off already.

What if the flag simply setKEDRO_TELEMETRY to True and kedro-telemetry would pick this up? Downside is clearly this is coupling the two libraries.

@lrcouto
Copy link
Contributor Author

lrcouto commented May 23, 2024

From my understanding, the main idea of this flag would be avoiding an automated run to stop because of the consent prompt, in a CI situation or similar. Our own e2e tests, for example, are manually creating the .telemetry file to prevent that from happening while they run.

@astrojuanlu
Copy link
Member

After discussing this a bit more internally, we decided to revert this PR.

I'll give a proper rationale in kedro-org/kedro#1640 but essentially, the way we will solve the problem of "telemetry prompt blocking unattended environments" is actually removing the prompt, rather than adding a --telemetry flag to more kedro CLI commands.

Thanks everybody for the thoughtful discussion here, the requirements weren't clear and @lrcouto you did the right thing 🙏🏼 Let's close this one and go with kedro-org/kedro#3888.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants