Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate if micropkg spec in pyproject.toml is used #2123

Closed
merelcht opened this issue Dec 14, 2022 · 5 comments
Closed

Investigate if micropkg spec in pyproject.toml is used #2123

merelcht opened this issue Dec 14, 2022 · 5 comments
Labels
Stage: User Research 🔬 Ticket needs to undergo user research before implementation

Comments

@merelcht
Copy link
Member

merelcht commented Dec 14, 2022

Description

In #2119 it was discovered that the feature to add micropkg specs in pyproject.toml hasn't worked since March 2022. Not a single user reported this.

There are still some open issues regarding the micropkg functionality in pyproject.toml:

Before we spend more time implementing and expanding this functionality we need to investigate if micropkg spec in pyproject.toml is used at all.

Context

  • It would also be worth talking to the Alloy team and hearing how they do packaging and if they find the micro-packaging functionality in Kedro useful or not.
  • Look at the micropkg usage stats on telemetry
@merelcht merelcht added the Stage: User Research 🔬 Ticket needs to undergo user research before implementation label Dec 14, 2022
@lamhm
Copy link

lamhm commented Jan 10, 2023

I am currently affected by the bug #2119. I hope the fix will be released soon, so that I can convince my team to switch to micropkg instead of using our in-house method (which is just a half-baked solution).

Here is why I only start trying micropkg now, and not before. I have been using Kedro since late 2020, starting with only one-person projects. In 2021, when I got assigned into a new team of data scientists, I realised that all my co-workers only coded in Jupyter notebooks and it was very hard to replicate their work outside their local environment. So, I adopted Kedro for the whole team. Since then, Kedro has played a very important role in our workflow.

If I recall correctly, the first time I heard about micropkg was late 2021. But I did not pay much attention to it since the standard method (wheel) was good enough for us. This is because my team at the time only worked on developing and training ML models. When our models reached an acceptable quality, we packaged them and sent them to the Software Engineering Team. The Engineering team will take care of the deployment. Therefore:

  • Packaging was not a bottleneck as we did not do it very frequently. And we only needed to package whole projects (not parts of them).
  • It was not easy for us to change our packaging method as we would need an agreement with the Engineering Team.

But now I am working in an R&D team that develop ML solutions from start to finish. I.e. we are in charge of not only developing and training models but also deploying models as software applications and continuously upgrading them. So, we have total control of the packaging and deployment methods for our ML models/pipelines. I can see an opportunity to streamline our model deployment (which happens very frequently now) by using micropkg. That is why I am about to convince my team to switch to micropkg. But I have to postpone this plan because of the bug above (I know how to work around it, but I cannot convince my team to use a broken tool).

Thank you very much for having given us a great tool like Kedro. I appreciate your support very much.

@lamhm
Copy link

lamhm commented Jan 10, 2023

Btw, I have noticed that Kedro allows to specify the destination for each package in [tool.kedro.micropkg.package] (in pyproject.toml). But the given destinations are always interpreted as local directories. In the use-case of my team, it would be great if the destination argument of micropkg also accepts an S3 bucket, i.e. the micropkg package command would push the packages to S3 (the access token can also be specified the same way as in micropkg pull, via fs-args). Right now, I have to manually call aws s3 sync after every kedro micropkg package run to push my packages to S3.

Should I create a feature request for this?

Thank you very much for your help.

@merelcht
Copy link
Member Author

Thanks so much for describing your use case and needs for the micropkg functionality @lamhm ! This functionality hasn't been a priority for a while, but I think it's necessary for the team to focus on it again and look at how we can improve it. You're more than welcome to create feature requests for this and we'll discuss it with the maintainer team.

@merelcht merelcht added this to the Micropackaging milestone Feb 6, 2023
@astrojuanlu
Copy link
Member

Possibly relevant:

Error message is not very helpful for --all, I am pretty lost when it talks about manifest and

(kedro_core) pattern main % kedro micropkg package
Please specify a micro-package name or add '--all' to package all micro-packages in the 'pyproject.toml' package manifest section.
(kedro_core) pattern main % kedro micropkg package --all
Nothing to package. Please update the 'pyproject.toml' package manifest section.

Originally posted by @noklam in #2761 (review)

@astrojuanlu
Copy link
Member

We decided in #3750 to deprecate kedro micropkg so we will not do this.

@astrojuanlu astrojuanlu closed this as not planned Won't fix, can't repro, duplicate, stale May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Stage: User Research 🔬 Ticket needs to undergo user research before implementation
Projects
Archived in project
Development

No branches or pull requests

3 participants