Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review deployment pages and consider how to integrate GetInData plugins docs and improve them overall #2435

Closed
stichbury opened this issue Mar 17, 2023 · 16 comments
Assignees
Labels
Component: Documentation 📄 Issue/PR for markdown and API documentation

Comments

@stichbury
Copy link
Contributor

stichbury commented Mar 17, 2023

Following discussion with the GetInData team, we should look to include more official documentation about Kedro plugins for deployment within our official guides.

One option is to add some docs on our side and point through to the docs e.g. https://kedro-azureml.readthedocs.io/en/0.3.6/ for Azure ML (which should probably be the first one, as it’s the most battle tested and feature complete one).

An alternative is that those plugin docs are brought inside our docs entirely (which has the benefit that the user stays on one location and has one style of docs to read) but also adds to the content load, which is already heavy.

I didn't have a ticket about this so have created one for discussion. Tagging in https://github.com/marrrcin

@stichbury stichbury added the Component: Documentation 📄 Issue/PR for markdown and API documentation label Mar 17, 2023
@stichbury stichbury changed the title <Title> Add more about plugins for deployment (whole docs from GetInData or some more detail and some pointers to them) Mar 17, 2023
@marrrcin
Copy link
Contributor

@stichbury
We've agreed that we should include something like a "quickstart" or "tutorial" in the Kedro docs and then put a reference to more in-depth documentation (ours) at the end. This way it will make our plugins' development cycles uninterrupted and not dependent on the Kedro docs release lifecycle.

How can we proceed on that?

@stichbury
Copy link
Contributor Author

@marrrcin We are still looking at changes to the information architecture, so this is difficult to pin down at present. In the current table of contents, what would you propose? A section in the Kedro plugins page? Or a new section about plugins with tutorials listed? You probably have some great ideas on how to position these in the current layout, which was can take forward as we think about the new one as part of #1866

@marrrcin
Copy link
Contributor

marrrcin commented Mar 21, 2023

There is a section called "Deployment" already, it's a good fit for our plugins. Actually some of the parts that are currently included there (e.g. SageMaker) can be replaced with the plugin-based approach.

@astrojuanlu
Copy link
Member

cc @deepyaman should we raise the priority of this one?

@stichbury
Copy link
Contributor Author

This is in the current sprint w/c 17-04

@stichbury stichbury changed the title Add more about plugins for deployment (whole docs from GetInData or some more detail and some pointers to them) Review deployment pages and consider how to integrate GetInData plugins docs and improve them overall Apr 26, 2023
@stichbury
Copy link
Contributor Author

stichbury commented Apr 26, 2023

I've done a little bit of reorganisation on the table of contents in the docs recently, which is unreleased at present, but should go out soon (you can see it in the latest docs). Let's consider how to make some changes to what we have in the set of deployment docs.

  1. I think each "How to deploy a Kedro project to X" page should have a set of subsections, something along the lines of Introduction, Prerequisites, Deployment process, and Summary. Within those sections, subsections are completely freeform, but it would be good to keep a consistent layout at the top level.

  2. Each of the pages should have a note on when it was last tested (and against which version of Kedro + other prerequisite tools), or at least some indicator of how confident we are in the content.

  3. Where there are two options (e.g. use what we describe or use the Get In Data plugin) we should explain the circumstances that make you prefer one vs the other, or if there's no difference, I'm not sure, but do we need both?

Turning to the deployment targets, I have these so far:

Deployment target/action Notes Technical reviewer input
Airflow * Existing Airflow docs for QB-supported Airflow plugin These docs are well-structured but I can't speak for correctness
* Get In Data's kedro-airflow-k8s plugin -- documentation suggests not to use for versions of Kedro > 17.0 ??
Reviewer: Is the documentation complete and up to date? How confident are you (green, amber or red)? Do we need both or should we just point through to GetInData docs?
Argo Kedro docs are for use without a plugin but also mention/link to an unsupported 3rd party plugin, last updated in summer 2020 Reviewer: Is the documentation complete and up to date? How confident are you (green, amber or red)?
AWS Batch Kedro docs are comprehensive Reviewer: Is the documentation complete and up to date? How confident are you (green, amber or red)?
AWS EMR Written up (as a blog post) but as yet unpublished This will stay as a blog post for now unless I'm persuaded otherwise, since it's nice to have the technical content. I will make a ticket to expand it and convert to docs though, if this makes sense to reviewers?
AWS SageMaker * Existing SageMaker docs
* GetInData have a kedro-sagemaker plugin
Reviewer: Is the documentation complete and up to date? How confident are you (green, amber or red)? Do we need both or should we just point through to GetInData docs?
Azure Battle-tested kedro-azureml plugin from GetInData Reviewer: Is the documentation complete and up to date? How confident are you (green, amber or red)?
Dask Existing Dask docs Reviewer: Is the documentation complete and up to date? How confident are you (green, amber or red)?
Kubeflow * Existing Kubeflow Workflows docs
* kedro-kubeflow plugin from GetInData.
Reviewer: Is the documentation complete and up to date? How confident are you (green, amber or red)? Do we need both or should we just point through to GetInData docs?
Prefect Existing Prefect docs have not been tested with Prefect 2.0 Reviewer: Is the documentation complete and up to date? How confident are you (green, amber or red)?
VertexAI kedro-vertexai plugin from GetInData. Reviewer: Is the documentation complete and up to date? How confident are you (green, amber or red)?

@stichbury
Copy link
Contributor Author

OK, I've got a little table going up in the previous comment, to track our confidence and the completeness of various deployment pages.

Please could I ask for some technical help from the usual suspects: @deepyaman @noklam @merelcht @marrrcin @astrojuanlu to answer the 3 questions above and noted in the table:

  1. Are the docs for a target complete?
  2. Are we confident in them?
  3. Do we need both the existing text and to point to the Get In Data plugin, or should we phase out our docs?.

Feel free to either drop a comment below for anything you want to comment on, or edit the table directly above if you're brave enough/foolish enough to want to wrangle a markdown table.

From your input, I'll build a set of tickets to plan out updates to the deployment content (if not the location in the docs).


Also, another question. Are there any missing targets? We don't have Databricks in this section, for example, but should provide a link to the docs stored elsewhere (and reconsider the distribution of Databricks docs in due course).

@stichbury stichbury self-assigned this Apr 28, 2023
@merelcht
Copy link
Member

merelcht commented May 2, 2023

My thoughts on the deployment targets listed above (fyi I haven't recently tried any of this so I'm totally guessing if these recommendations still work):

  1. Airflow: these docs are indeed in good shape, but haven't been changed since 2021. Without trying the steps I'd probably give it an amber 🟠 rating. It seems like our recommendation airflow with astronomer is slightly different from the GetinData one which uses k8s. I'm not enough of an airflow expert to say which approach is better so for the time being I'd keep both.
  2. Argo: I know nothing about Argo. These docs are pretty old and the team member who wrote them isn't on the team anymore. I'd give it a red 🔴 rating.
  3. AWS Batch: these look good, but also haven't been changed in a long time. I'm not personally confident that this would still work without trying it so would give it a red 🔴 .
  4. AWS Sagemaker: similar to Argo, these are old docs and written by a member who isn't at QB anymore. I'd probably recommend the GetInData plugin instead.
  5. Azure: very happy to recommend the GetInData plugin here.
  6. Dask: These are fairly recent and added by Deepyaman, so I'm more confident these are in a good state and would rate them green 🟢
  7. Kubeflow: same as for AWS Sagemaker: I'd recommend the GetInData plugin instead of our old docs.
  8. Prefect: the code in these docs has been updated by someone from QB fairly recently, so I'd be happy to keep them and rate them green 🟢
  9. VertexAI: again very happy to recommend the GetInData plugin.

@stichbury
Copy link
Contributor Author

Thanks @merelcht that is amazingly useful.

Given that you're unsure about Airflow, Argo and Batch, I'll ask @deepyaman for a second opinion on those, but TBH, I'm happy to just slate those for an update when there's opportunity (and look at usage logs to see which to prioritise)

@marrrcin
Copy link
Contributor

marrrcin commented May 4, 2023

My two cents:

  • Airflow - GetInData's plugin is at the moment (2023-05-04) supporting kedro<0.18, we're planning to upgrade it, but no specific timeline for that yet.
  • Azure - do you want to have a small write-up on it inside of the official docs + link at the end OR just directly point out to the plugin docs? I think a small paragraph would be nice.

@noklam
Copy link
Contributor

noklam commented May 4, 2023

I agree with Merel mostly, I have some minor comments.

  • I will put airflow rating same as Prefect, it works fine but haven't updated with latest version of Airflow API, so 💚 for me
  • aws step is missing from the table? I know nothing about it but I remember Error to deploy spaceflights tutorial in AWS Steps Function  #1006 so this should still work, but I think we can make some suggestion on which aws service to start with if user don't have strong preference.

@stichbury
Copy link
Contributor Author

Thanks @marrrcin, that's very useful. I'll take your input on Airflow on board, and likewise for Azure. I plan to add some text for that as you suggest.

And to @noklam also, thank you 🙏 I have no idea how I missed AWS Steps. I'll add it to my list, and add it to the flowchart.

Also, we don't have any copy about "Which AWS to use?" but that would be very useful. Let me get that on my list too.

@marrrcin
Copy link
Contributor

marrrcin commented May 5, 2023

I've revamped the quickstart guide for AzureML here: https://kedro-azureml.readthedocs.io/en/0.4.1/source/03_quickstart.html

@astrojuanlu
Copy link
Member

I'm a bit late to the party, but regarding Prefect, notice that they're written for 1.x, and Prefect 2.0 changed a few things #2431 so I'd give those an amber rating too 🟠

@stichbury
Copy link
Contributor Author

I will create a pair of tickets for updating the Prefect docs and Airflow/Astronomer docs to the latest versions.
And note the version used in the docs so readers are aware.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Documentation 📄 Issue/PR for markdown and API documentation
Projects
Archived in project
Development

No branches or pull requests

5 participants