Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KED-3025] Simplify kedro new workflow into one question #1361

Closed
antonymilne opened this issue Mar 23, 2022 · 21 comments · Fixed by #1643
Closed

[KED-3025] Simplify kedro new workflow into one question #1361

antonymilne opened this issue Mar 23, 2022 · 21 comments · Fixed by #1643
Assignees
Labels
Component: CLI Issue/PR that addresses the CLI for Kedro Issue: Feature Request New feature or improvement to existing feature

Comments

@antonymilne
Copy link
Contributor

Transferred from Jira ticket by @datajoely

Both internal and external survey say that they don't need to fine control of 3 separate questions
image
image

@antonymilne antonymilne added Issue: Feature Request New feature or improvement to existing feature Component: CLI Issue/PR that addresses the CLI for Kedro labels Mar 23, 2022
@antonymilne
Copy link
Contributor Author

antonymilne commented Mar 23, 2022

My opinion here is very much in favour of getting rid of the separate questions for exactly the same reason as @WaylonWalker gives here - having three different things just creates more confusion. When I first did kedro new I didn't really know what the difference between the three different things were and it just made using kedro feel more complicated.

image

@datajoely
Copy link
Contributor

So the only thing that's worth aligning on is the pesky - vs _ debate

According to PEP8 'use of underscores is discouraged' this is why we pip install kedro-viz but if you wanted to use the library you write from kedro_viz import api .

Possible solutions:

  1. In theory most users wouldn't care/know if we aggressively made everything underscores since its syntactically valid in every case, even if it breaks convention.
  2. We dynamically coerce the name of the package to use dashes and module to use underscores. This would follow convention, but I'm unsure if it will introduce unintended side effects.

@antonymilne
Copy link
Contributor Author

Just a couple of clarifications to add to this - I spent too long reading about package names a while ago and it is pretty confusing so let me share the benefits of what I learnt here:

  • strictly speaking, the name used in pip install is the distribution name rather than the package name, and PEP8 doesn't say anything about the use of underscores in this (see https://stackoverflow.com/questions/54597212/using-hyphen-dash-in-python-repository-name-and-package-name)
  • the PEP8 convention is speaking about package name as in import .... Here underscores are discouraged, but given that dashes are impossible, underscores are realistically the only option if you have a package name with multiple words (or we could just take the first word in the project name, or we could take the first letter of every word in the project name... but probably just makes things more confusing)
  • the importable package name doesn't even need to be the same as the distribution name (as in pip install scikit-learn vs. import sklearn example)
  • pip install kedro_viz actually works fine as it will magically be converted to kedro-viz (as per https://stackoverflow.com/questions/19097057/pip-e-no-magic-underscore-to-dash-replacement)

All this is obviously very confusing and irrelevant to most users... so I'd probably just just make everything underscores, and then setuptools will convert that to a dash by itself when needed.

@datajoely
Copy link
Contributor

datajoely commented Mar 23, 2022

I would also like to suggest we introduce --name as a parameter here if we're only asking one question

@datajoely
Copy link
Contributor

Just found this exists https://packaging.pypa.io/en/latest/utils.html#packaging.utils.canonicalize_name

@deepyaman
Copy link
Member

Even though I do sometimes specify different values for each prompt (and not just press enter 3 times and let the genie do it automatically), I agree that there's limited value/unnecessary complexity. cookiecutter.repo_name and cookiecutter.project_name are used in very few locations through the codebase, and those are easy to override manually as and when needed.

@datajoely
Copy link
Contributor

I would also provide a way of doing this and giving the telemetry acknowledgement in one command. I know you can do some of that with yes | kedro new --starter=spaceflights --config config.yml but it's not clean at all!

@antonymilne
Copy link
Contributor Author

Since kedro-telemetry is a separate package and generally won't be called until after kedro new is complete I don't think there's a good way of doing that. I think it's ok to do kedro new --starter=spaceflights --config config.yml and then something for telemetry afterwards like echo "consent: true" > .telemetry.

@datajoely
Copy link
Contributor

Can't build a consent workflow directly into kedro since we know what's needed .telemetry? I know a lot of CI processes have to do this annoying hack.

@merelcht merelcht added this to the Cleanup CLI commands milestone Jun 20, 2022
@yetudada
Copy link
Contributor

I would like to propose that the user journey looks like:

(my-virtual-environment) ➜  ~ kedro new

Project Name:
=============
Please enter a human readable name for your new project.
Spaces and punctuation are allowed.
 [New Kedro Project]: Example Project
Change directory to the project generated in /Users/yetunde_dada/example-project by entering `cd /Users/yetunde_dada/example-project`

The project name "Example Project" has been applied to:
- The project title in /Users/yetunde_dada/example-project/README.md
- The folder created for your project in /Users/yetunde_dada/example-project
- The project's python package in /Users/yetunde_dada/example-project/src/example_project

A best-practice setup includes initialising git and creating a virtual environment, refer to the documentation: https://kedro.readthedocs.io/

This workflow has the following characteristics:

  • It uses the default names that we provide
  • And it also tells users what we have done

@merelcht merelcht moved this to To Do in Kedro Framework Jun 20, 2022
@antonymilne
Copy link
Contributor Author

I like this, although I'd put the "Change directory to the project generated in /Users/yetunde_dada/example-project by entering `cd /Users/yetunde_dada/example-project" last and maybe in a special colour because it's the most important part and a bit hidden at the moment.

@datajoely
Copy link
Contributor

Two thoughts

  • What is the benefit of allowing the space and punctuation?
  • Can we still expose the cookiecutter arguments for expert users?

@datajoely
Copy link
Contributor

One extra thought, can we introduce kedro new --name so that automated processes could do this easily

@antonymilne
Copy link
Contributor Author

Just my personal opinions here:

What is the benefit of allowing the space and punctuation?

Maybe not much, but what would be the benefit of not allowing them? It is used in the human-readable README.md so I don't see any reason to not allow any characters.

Can we still expose the cookiecutter arguments for expert users?

I presume you mean expose them in the config_path option? Interesting idea and I have no strong feelings on whether or not we should do this. It would make it more of a non-breaking change certainly. We'd need to look into exactly how prompts.yml and cookiecutter.json are used to see if this is easy to do.

One extra thought, can we introduce kedro new --name so that automated processes could do this easily

I think just using yes for this is ok, and we also have the -c/--config_path option which does exactly this if you've written your project name to a file. If this is really needed then a better solution might be just to extend -c so it can take in more than just a file, e.g. kedro new -c "project_name: ...". This way we don't need to add a special --name argument and keep the generic nature of a config file which can handle many possible cookiecutter variables.

@datajoely
Copy link
Contributor

@AntonyMilneQB yeah I think that makes sense, I'm just pushing that we value the workflow of the 'CI/CD configurer' user as well as the 'person in front of their computer' user

@antonymilne
Copy link
Contributor Author

Yeah, it's a good point I think but should not be affected by this really. Probably we should add something to our documentation to make it clearer how to do this in CI/CD (especially given the .telemetry point).

@merelcht merelcht self-assigned this Jun 21, 2022
@merelcht merelcht moved this from To Do to In Progress in Kedro Framework Jun 21, 2022
@datajoely
Copy link
Contributor

Adding evidence the telemetry workflow on CI is a common trap:
image

@nikos-kal
Copy link

Run into the above with Github Actions - solved with piping "yes" to the kedro test command but it was confusing at first for sure!

@merelcht
Copy link
Member

I completely agree we should improve the flow with telemetry on automated workflows, but it's a completely separate issue so I'd prefer to not add it into this task and create a new ticket for it.

On this @datajoely :

Can we still expose the cookiecutter arguments for expert users?

Why would we still want to keep those? Do expert users want to have different values for the repo name and python package instead of the default?

@datajoely
Copy link
Contributor

I completely agree we should improve the flow with telemetry on automated workflows, but it's a completely separate issue so I'd prefer to not add it into this task and create a new ticket for it.

Nice 💪 let's do that

Why would we still want to keep those? Do expert users want to have different values for the repo name and python package instead of the default?

I guess I just need to understand what the automatic approach will be? In the new solution what would happen if a user provides the name "anomaly detection"

I think I would expect Kedro to define the following as per PEP-423 and PEP-503:

  • Package canonical name anomaly-detection
  • Module name anomaly_detection

If this is the case, Kedro is making implicit assumptions and expert users should have want to explicit control over this.

@merelcht
Copy link
Member

New ticket here: #1640 feel free to add more info!

I think I would expect Kedro to define the following as per PEP-423 and PEP-503:

  • Package canonical name anomaly-detection
  • Module name anomaly_detection

If this is the case, Kedro is making implicit assumptions and expert users should have want to explicit control over this.

Yes that's how I'm thinking about this. I get your point and will look a bit more how easy it is to maintain that functionality while getting rid of the prompts.

@merelcht merelcht mentioned this issue Jun 22, 2022
5 tasks
@merelcht merelcht linked a pull request Jun 22, 2022 that will close this issue
5 tasks
@merelcht merelcht moved this from In Progress to In Review in Kedro Framework Jun 22, 2022
@merelcht merelcht moved this from In Review to In Progress in Kedro Framework Jun 23, 2022
@merelcht merelcht moved this from In Progress to In Review in Kedro Framework Jun 23, 2022
Repository owner moved this from In Review to Done in Kedro Framework Jul 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: CLI Issue/PR that addresses the CLI for Kedro Issue: Feature Request New feature or improvement to existing feature
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

6 participants