-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add spaceflights-pyspark
starter
#147
Conversation
Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com>
spaceflights-pyspark/README.md
Outdated
```bash | ||
pip install kedro | ||
kedro new --starter=spaceflights-pyspark | ||
cd <my-project-name> # change directory into newly created project directory | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to keep this explanation or remove this from the new starters? @amandakys
Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com>
Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com>
Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com>
…labs/kedro-starters into create-spaceflights-pyspark
Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com>
Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com>
Will this be introduced before 0.19? |
spaceflights-pyspark/{{ cookiecutter.repo_name }}/conf/README.md
Outdated
Show resolved
Hide resolved
return companies | ||
|
||
|
||
def load_shuttles_to_csv(shuttles: pd.DataFrame) -> pd.DataFrame: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use transcoding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original data is in excel
format. If I don't add this node, how would it be turned into csv?
class TestDataScienceNodes: | ||
def test_split_data(self, dummy_data, dummy_parameters): | ||
X_train, X_test, y_train, y_test = split_data(dummy_data, dummy_parameters["model_options"]) | ||
assert len(X_train) == 2 | ||
assert len(X_test) == 1 | ||
assert len(y_train) == 2 | ||
assert len(y_test) == 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⭐️ I like that we are adding some example for tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think the test is good enough or should it be more meaningful and also test spark stuff?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM from a docs perspective
Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com>
Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com>
Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com>
Co-authored-by: Jo Stichbury <jo_stichbury@mckinsey.com> Co-authored-by: Nok Lam Chan <nok.lam.chan@quantumblack.com>
Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com>
…labs/kedro-starters into create-spaceflights-pyspark
Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com>
Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com>
I took a quick look and it's fine. I don't have time to test it out, it's best to have another engineer to approve this maybe @AhdraMeraliQB ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested it manually and all looks good. Thank you @merelcht! 👍
spaceflights-pyspark/{{ cookiecutter.repo_name }}/conf/base/parameters_data_science.yml
Outdated
Show resolved
Hide resolved
Co-authored-by: Sajid Alam <90610031+SajidAlamQB@users.noreply.github.com>
Signed-off-by: Merel Theisen <merel.theisen@quantumblack.com>
…labs/kedro-starters into create-spaceflights-pyspark
Motivation and Context
kedro-org/kedro#2984 subtask of kedro-org/kedro#2838 which in turn is part of the new project creation flow.
How has this been tested?
Created a project locally using:
kedro new --starter=/Users/merel_theisen/Projects/kedro-starters/spaceflights-pyspark/
and didkedro run
Things to note:
logging.yml
outside of thebase
folder to the top-levelconf/
folderTestDataScienceNodes
to demonstrate how you can write a unit test for a node.Questions:
kedro new --starter=spaceflights-pyspark
?Checklist