feat: Exit before_command_run if telemetry consent is passed through kedro run #681

lrcouto · 2024-05-17T17:51:42Z

Description

Works together with kedro-org/kedro#3876.

Modifies the before_command_run hook in kedro-telemetry to exit if the user has passed the telemetry consent through the flag on kedro run.

Development notes

Checklist

Opened this PR as a 'Draft Pull Request' if it is work-in-progress
Updated the documentation to reflect the code changes
Added a description of this change in the relevant RELEASE.md file
Added tests to cover my changes

Signed-off-by: lrcouto <laurarccouto@gmail.com>

merelcht · 2024-05-20T15:34:48Z

kedro-telemetry/kedro_telemetry/plugin.py

@@ -107,6 +107,10 @@ def before_command_run(
            if not project_metadata:  # in package mode
                return

+            for arg in command_args:


Doesn't this mean though that using the --telemetry flag is the same as opting out? If we return at this point, no data will be sent even if the user provided "yes" to the flag.

I imagined the person would only use the flag once and then on subsequent runs that would be handled, but you're right. I'm going to look over this one again.

Signed-off-by: lrcouto <laurarccouto@gmail.com>

lrcouto · 2024-05-21T05:08:54Z

Changed the code to bypass the _check_for_telemetry_consent function if the --telemetry flag is present, instead picking up the consent response directly from the CLI flag. The only issue I personally see with this approach is that we validate the response twice, once here in the hook, and once in the run function itself.

DimedS · 2024-05-22T15:36:43Z

Why, then, did we create other PRs for the Kedro CLI? Perhaps that task can be handled within that hook for all commands?

lrcouto · 2024-05-22T16:01:37Z

Why, then, did we create other PRs for the Kedro CLI? Perhaps that task can be handled within that hook for all commands?

We'd still need at the very least to have a PR that introduces the "--telemetry" flag to the run function to people to use in the CLI, so it needs to be changed regardless.

DimedS · 2024-05-22T17:05:34Z

Why, then, did we create other PRs for the Kedro CLI? Perhaps that task can be handled within that hook for all commands?

We'd still need at the very least to have a PR that introduces the "--telemetry" flag to the run function to people to use in the CLI, so it needs to be changed regardless.

In that case, I believe it would be better to put all the logic for validating the answer and writing to the file into that hook (since it was already there previously). We can then add just a few lines to the Kedro CLI to initialise that command. This approach avoids duplication and ensures the same logic is used for all commands. Do you know why that hook doesn't work on kedro new command?

lrcouto · 2024-05-22T18:15:24Z

Why, then, did we create other PRs for the Kedro CLI? Perhaps that task can be handled within that hook for all commands?

We'd still need at the very least to have a PR that introduces the "--telemetry" flag to the run function to people to use in the CLI, so it needs to be changed regardless.

In that case, I believe it would be better to put all the logic for validating the answer and writing to the file into that hook (since it was already there previously). We can then add just a few lines to the Kedro CLI to initialise that command. This approach avoids duplication and ensures the same logic is used for all commands. Do you know why that hook doesn't work on kedro new command?

The hook does get called on the "new" command, but the prompt to consent is called the first time a Kedro CLI command is used inside the project directory. You can test this by creating a new project, using a command like "kedro --version" outside the project directory, and then inside. The consent prompt will be shown when you do it inside the project directory. Because of that, if you set the .telemetry file to already have a response on project creation, it'll skip the prompt.

We could do all of the validation and writing for the kedro run --telemetry case on the hook itself so we don't have to do it twice, I think that would work. I'm gonna give it a try.

ankatiyar · 2024-05-23T11:00:24Z

It seems like in kedro-org/kedro#3876, for a kedro run, the .telemetry file is updated according to the user response. So if a user does kedro run --telemetry=no then for all subsequent runs the telemetry will not be collected unless kedro run --telemetry=yes is run again. I think manipulating .telemetry file for kedro new makes sense but not for kedro run

In my head, if kedro run --telemetry=no is used, telemetry should not be collected for that particular run? Basically Kedro should pass along the command to kedro-telemetry and if we encounter --telemetry=no we should exit without sending anything to heap.

It seems strange that one run command decides the telemetry collection for all subsequent commands. We could add --telemetry flag to all subcommands and have the logic in kedro-telemetry to not send data if --telemetry flag exists in the command itself. So this PR seems fine to me, but I think the logic in kedro-org/kedro#3876 should simply be adding the flag. Otherwise, if we keep kedro-org/kedro#3876 the way it is, it will set the .telemetry consent to false so then we don't need to be doing any checking for the --telemetry flag because that is unnecessary. What do you think @lrcouto @merelcht @DimedS

DimedS · 2024-05-23T11:20:47Z

What do you think @lrcouto @merelcht @DimedS

I agree with your point, but for me, it's more about naming. We can change it to set_telemetry=on/off and keep the concept as it is. I believe it's useful to provide this option to users. Additionally, I think the core issue is that we need to run the hook before the kedro-new command. This way, we can ask users if they want to participate in telemetry when they are creating a project, rather than during the first operation in the project. This approach avoids potential confusion, especially if that first operation occurs in a production environment.

lrcouto · 2024-05-23T12:23:08Z

What do you think @lrcouto @merelcht @DimedS

In a vacuum, a do agree with this notion that you should be able to use the command to decide if telemetry will be collected on a run by run basis, but the idea of this specific task was to introduce a CLI option that reproduces the behavior of the prompt the user gets when running the CLI on a project for the first time, and that writes on the .telemetry file. So the idea would be using the flag only if you do want to write on the file and change consent. This is also the behavior you see on the flag in the new function.

If we were to do that, I'd change the name of the flag so it doesn't become confusing, since we have another one with the same name that does change the .telemetry file.

Changing the prompt to consent or not to telemetry to project creation instead of on the first CLI run would need to change when the plugin first becomes available, and it's a bit out of the scope of this specific task. But we could discuss that and maybe open another issue.

lrcouto · 2024-05-23T12:41:52Z

Also, if we do want to rethink how this is going to work, should we revert kedro-org/kedro#3876? With only that one merged and not this one, it will allow the user to set consent through a flag but the prompt will override it.

ankatiyar · 2024-05-23T12:46:13Z

I agree with your point, but for me, it's more about naming. We can change it to set_telemetry=on/off and keep the concept as it is. I believe it's useful to provide this option to users.

Yeah, it might be useful but it's a strange thing to do as an option for the kedro run. Could be a separate kedro-telemetry command maybe?

Additionally, I think the core issue is that we need to run the hook before the kedro-new command. This way, we can ask users if they want to participate in telemetry when they are creating a project, rather than during the first operation in the project. This approach avoids potential confusion, especially if that first operation occurs in a production environment.

That I think could be a separate issue. Out of scope for this discussion

For now I think -

when a user creates a project with kedro new, telemetry wouldn't prompt them for permission at that point because data is only collected from inside a project. So if you do kedro new --telemetry=no it creates a project with a .telemetry file with consent=false so any subsequent kedro commands for that project (from within the project directory), telemetry will not be collected. This seems fine to me.
For when you're running Kedro commands within a project, kedro run or kedro catalog list etc - No telemetry should be collected for that specific command. (This will require adding --telemetry to all project subcommands)
If a user wants to turn telemetry for a project off at any point, they can update the .telemetry file themselves. OR maybe we can add a CLI command to kedro-telemetry to turn change consent to false kedro telemetry off or somthing?

DimedS · 2024-05-23T13:17:44Z

I agree with @ankatiyar proposals that setting telemetry on kedro new should apply to the entire project (we should also consider asking users about telemetry at that step, rather than just allowing them to specify --telemetry). I agree that from one side it will be clearer if using the --telemetry option with other commands applies only to that particular run, but from other side - the same flag will behave differently. However, I believe we need to gather more opinions on this matter.

I also concur with @lrcouto that we should finalise the feature design before proceeding with the coding. The PR, which has already been merged, is not currently properly functioning. It seems that we will need the next sprint to agree on a new solution and implement it. Therefore, I support reverting the previous PR and rethinking the approach to avoid blocking the current release.

noklam · 2024-05-23T13:36:38Z

Just left a comment here before everyone jump in, I asked @astrojuanlu to clarify the requirements here. I think both solutions are correct but the main question here is what the desired user journey.

merelcht · 2024-05-23T13:39:30Z

Thanks all for your thoughts on this. I've looked at the original tickets and comments and I agree that the feature ask actually wasn't clear here. I interpreted it the same way as @lrcouto did, thinking that kedro new --telemetry and kedro run --telemetry should behave the same. The main reason we introduced this feature is for automated processes (test in CI), so I didn't actually think about the setting of telemetry for a specific run. But @ankatiyar brings some very thoughtful points.

I still think that the way @lrcouto built it is my ideal approach, but would be good to get a product view from @astrojuanlu

lrcouto · 2024-05-23T13:53:45Z

PR for reversion here in case we want it: kedro-org/kedro#3888

We could go either way, proceed as it is or revert and go back to the drawing table.

astrojuanlu · 2024-05-23T14:01:54Z

To recap on how we got here:

The original user problem was described in kedro-org/kedro#1640:

When any kedro command is executed for the first time or in a clean environment (which is often the case in CI/CD) the telemetry prompt gets run. The user then has to answer Yes or No to running telemetry. If no user is involved, e.g. in an automated CI/CD workflow a hack needs to be put in place to programatically add a .telemetry file.

Then in kedro-org/kedro#2867 a solution was proposed, but it fixated in kedro new.

And in the middle of the implementation kedro-org/kedro#3701, we realised that adding that flag to kedro new didn't actually solve the original problem kedro-org/kedro#3701 (review) because it's kedro run the command that is most likely to run on an unattended environment and therefore get stuck.

So in kedro-org/kedro#3701 (comment) we agreed to add "a similar flag" to kedro run.

But indeed, we never fully clarified what said flag was supposed to do.

I do agree with @ankatiyar that it's weird for kedro run --telemetry to set the telemetry consent for the whole project, rather than for the specific run. Essentially what she described in #681 (comment). Responding to a couple of things:

(This will require adding --telemetry to all project subcommands)

I suggested that we take this to kedro-telemetry on kedro-org/kedro#3701 (comment) but it was not clear how feasible it was, plus we wanted to be practical and cover just kedro new and kedro run.

OR maybe we can add a CLI command to kedro-telemetry to turn change consent to false kedro telemetry off or somthing?

This is an idea we hadn't considered yet and maybe worth giving a second thought. At least it decouples kedro commands from kedro-telemetry stuff, which is really neat and clean.

As a user, going to my CI runs and changing kedro run to kedro run --skip-telemetry or doing kedro telemetry off && kedro run looks mostly the same (?).

noklam · 2024-05-23T16:12:32Z

As a user, going to my CI runs and changing kedro run to kedro run --skip-telemetry or doing kedro telemetry off && kedro run looks mostly the same (?).

Conceptually this de-coupling is cleaner, but I don't know if it's really useful. This would work equally well:

pip uninstall kedro-telemetry && kedro run

At the end, the only one cares about telemetry is us, users certainly won't switch telemetry on if it's off already.

What if the flag simply setKEDRO_TELEMETRY to True and kedro-telemetry would pick this up? Downside is clearly this is coupling the two libraries.

lrcouto · 2024-05-23T20:07:27Z

From my understanding, the main idea of this flag would be avoiding an automated run to stop because of the consent prompt, in a CI situation or similar. Our own e2e tests, for example, are manually creating the .telemetry file to prevent that from happening while they run.

astrojuanlu · 2024-05-24T10:38:59Z

After discussing this a bit more internally, we decided to revert this PR.

I'll give a proper rationale in kedro-org/kedro#1640 but essentially, the way we will solve the problem of "telemetry prompt blocking unattended environments" is actually removing the prompt, rather than adding a --telemetry flag to more kedro CLI commands.

Thanks everybody for the thoughtful discussion here, the requirements weren't clear and @lrcouto you did the right thing 🙏🏼 Let's close this one and go with kedro-org/kedro#3888.

lrcouto and others added 2 commits May 17, 2024 14:47

Exit before_command_run if telemetry consent is passed through kedro run

b18713a

Signed-off-by: lrcouto <laurarccouto@gmail.com>

Merge branch 'main' into modify-hook-to-skip-consent-prompt

e910141

lrcouto changed the title ~~Exit before_command_run if telemetry consent is passed through kedro run~~ feat: Exit before_command_run if telemetry consent is passed through kedro run May 17, 2024

lrcouto mentioned this pull request May 17, 2024

Add implementation of telemetry consent flag for kedro run kedro-org/kedro#3876

Merged

7 tasks

lrcouto marked this pull request as ready for review May 17, 2024 22:12

merelcht reviewed May 20, 2024

View reviewed changes

lrcouto added 2 commits May 21, 2024 00:58

bypass consent verification if passed through telemetry flag

828c2fa

Signed-off-by: lrcouto <laurarccouto@gmail.com>

Add tests for before_command_run hook when the telemetry flag is used

9f2b0f5

Signed-off-by: lrcouto <laurarccouto@gmail.com>

lrcouto added 2 commits May 21, 2024 13:12

Merge branch 'main' into modify-hook-to-skip-consent-prompt

b2efc85

Merge branch 'main' into modify-hook-to-skip-consent-prompt

f5b0c67

lrcouto requested review from noklam, DimedS, SajidAlamQB and ankatiyar May 22, 2024 13:07

lrcouto mentioned this pull request May 22, 2024

Add an option for kedro new and kedro run to skip telemetry (e.g. for CI/CD environment) kedro-org/kedro#2867

Closed

Merge branch 'main' into modify-hook-to-skip-consent-prompt

47d8cc4

astrojuanlu self-requested a review May 23, 2024 13:13

lrcouto mentioned this pull request May 23, 2024

Revert "Add implementation of telemetry consent flag for kedro run" kedro-org/kedro#3888

Merged

Merge branch 'main' into modify-hook-to-skip-consent-prompt

2e4c9a7

astrojuanlu closed this May 24, 2024

DimedS mentioned this pull request Jun 15, 2024

kedro-telemetry: Introduce environment variables for disabling telemetry and revise disabling methods #728

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Exit before_command_run if telemetry consent is passed through kedro run #681

feat: Exit before_command_run if telemetry consent is passed through kedro run #681

lrcouto commented May 17, 2024 •

edited

Loading

merelcht May 20, 2024

lrcouto May 20, 2024 •

edited

Loading

lrcouto commented May 21, 2024

DimedS commented May 22, 2024

lrcouto commented May 22, 2024

DimedS commented May 22, 2024

lrcouto commented May 22, 2024

ankatiyar commented May 23, 2024

DimedS commented May 23, 2024 •

edited

Loading

lrcouto commented May 23, 2024

lrcouto commented May 23, 2024

ankatiyar commented May 23, 2024 •

edited

Loading

DimedS commented May 23, 2024

noklam commented May 23, 2024

merelcht commented May 23, 2024

lrcouto commented May 23, 2024

astrojuanlu commented May 23, 2024 •

edited

Loading

noklam commented May 23, 2024 •

edited

Loading

lrcouto commented May 23, 2024

astrojuanlu commented May 24, 2024

feat: Exit before_command_run if telemetry consent is passed through kedro run #681

feat: Exit before_command_run if telemetry consent is passed through kedro run #681

Conversation

lrcouto commented May 17, 2024 • edited Loading

Description

Development notes

Checklist

merelcht May 20, 2024

Choose a reason for hiding this comment

lrcouto May 20, 2024 • edited Loading

Choose a reason for hiding this comment

lrcouto commented May 21, 2024

DimedS commented May 22, 2024

lrcouto commented May 22, 2024

DimedS commented May 22, 2024

lrcouto commented May 22, 2024

ankatiyar commented May 23, 2024

DimedS commented May 23, 2024 • edited Loading

lrcouto commented May 23, 2024

lrcouto commented May 23, 2024

ankatiyar commented May 23, 2024 • edited Loading

DimedS commented May 23, 2024

noklam commented May 23, 2024

merelcht commented May 23, 2024

lrcouto commented May 23, 2024

astrojuanlu commented May 23, 2024 • edited Loading

noklam commented May 23, 2024 • edited Loading

lrcouto commented May 23, 2024

astrojuanlu commented May 24, 2024

lrcouto commented May 17, 2024 •

edited

Loading

lrcouto May 20, 2024 •

edited

Loading

DimedS commented May 23, 2024 •

edited

Loading

ankatiyar commented May 23, 2024 •

edited

Loading

astrojuanlu commented May 23, 2024 •

edited

Loading

noklam commented May 23, 2024 •

edited

Loading