Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core feature] UX improvement: add --copy flag to pyflyte run for more targeted file copying during fast registration #5343

Open
2 tasks done
cosmicBboy opened this issue May 9, 2024 · 11 comments
Labels
enhancement New feature or request

Comments

@cosmicBboy
Copy link
Contributor

cosmicBboy commented May 9, 2024

Motivation: Why do you think this is important?

Say I have the following flytekit project folder structure:

/my_project
    /src
        __init__.py
        module_a.py
        ... # etc
    tasks.py
    workflows.py
    setup.py

Where src is a pure python internal library that I can independently run/test. Then I want to import modules from src into my flytekit tasks.py and workflows.py files.

When I build my ImageSpec, I currently have to role my own solution to install src is a pip package (maybe via github).

Then, when I want to quickly iterate on it with pyflyte run --remote, I use the --copy-all flag to make sure src is in the PYTHONPATH of the container running my tasks.

Goal: What should the final outcome look like, ideally?

Add a --copy flag, which can be specified multiple times for more targeted inclusion of files during fast registration

pyflyte run --copy dir1 --copy my/dir2

Describe alternatives you've considered

The current workarounds for this are --copy-all at iteration time and pip installing via github at ImageSpec build time (there are probably more workarounds here).

Propose: Link/Inline OR Additional context

This is part of a broader pain point of the flyte-maintainer-approved way of building a flytekit project. The tutorials/guides in our docs sort of assume you're working on a single file, or some simpler structure that doesn't include having a python project that's separate from flyte task/workflow modules.

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes
@cosmicBboy cosmicBboy added enhancement New feature or request untriaged This issues has not yet been looked at by the Maintainers labels May 9, 2024
@thomasjpfan
Copy link
Member

Having a ImageSpec(local_packages=...) is nice once the local pip installable package is set.

For rapid iteration, I think building an image every time code changes is a bit more overhead. I'm thinking of this user story:

  1. They already have a container_image set to a predefined string
  2. They have a local pip installable python library locally they are iterating on.
  3. They want their local python library installed in their task environment.

What do you think of having a new --install-local that installs the local python library before the task executes?

pyflyte run --remote --install-local

@fg91
Copy link
Member

fg91 commented May 9, 2024

Notes from discussion in contribs' sync:

This could be incorporated into fast registration. A user would specify e.g. pyflyte run --local-package /some/path1/package1 --local-package /some/path2/package2 .... These packages would then be copied into the code tarball (or separate ones). The pod entrypoint pyflyte-fast-execute would create a directory called e.g. /home/flyte/additional-packages, add this directory to the pythonpath, and finally unpack the additional package tarballs into this directory.

The benefit of this mechanism over ImageSpec would be that it can be used regardless of how images are built. Some organizations build images purely in CI instead of using ImageSpec.

@thomasjpfan
Copy link
Member

@fg91 For multiple local-packages, does the Python library have code that needs to be complied? For example, a Polars extension plugin that requires Rust to build.

In any case, I think there is enough Python only packages out there where --local-package ... is a significant improvement.

@cosmicBboy cosmicBboy changed the title [Core feature] UX improvement: support local pip-installable packages in ImageSpec and pyflyte run --remote [Core feature] UX improvement: add --copy flag to pyflyte run for more targeted file copying during fast registration Jun 6, 2024
@cosmicBboy
Copy link
Contributor Author

Note: another simple solution proposed by @thomasjpfan to this is a --copy <dir> flag

@eapolinario
Copy link
Contributor

Can we discuss the differences between --copy and --local-package?

@cosmicBboy
Copy link
Contributor Author

--local-package implies that it'll actually be installed in the environment, whereas in reality it's just a directory in PYTHONPATH.

@thomasjpfan
Copy link
Member

After discussing with @eapolinario , I like going with --include that can include directories or files.

With flyteorg/flytekit#2663, the automatic copy behavior copies over everything that is imported by the workflow file that is also in the same directory as the workflow file. This means --include would be useful for data files or compiled files.

@fg91
Copy link
Member

fg91 commented Aug 13, 2024

After discussing with @eapolinario , I like going with --include that can include directories or files.

With flyteorg/flytekit#2663, the automatic copy behavior copies over everything that is imported by the workflow file that is also in the same directory as the workflow file. This means --include would be useful for data files or compiled files.

If the user copies entire python packages they are iterating on into the tarball with this flag, are they expected to adapt the python path in the underlying image or do we add a convenience flag for this as well? Maybe --include-package? (Not saying we should, just interested in your opinion.)

@thomasjpfan
Copy link
Member

thomasjpfan commented Aug 14, 2024

If the user copies entire python packages they are iterating on into the tarball with this flag, are they expected to adapt the python path

If we can get away with having less options, I would prefer not to have a way to configure PYTHONPATH. Note that for pure python package, then they would not need --include at all, since flyteorg/flytekit#2663 should copy over everything that was imported by their workflow.

If a user still wants to run --include to include other files, then, by default, it'll copy the files into the working directory of the image with the same folder structure as the local execution. In this case, I am assuming that the working directory is in the PYTHONPATH and that --dest-dir is "." (which means working directory).


Note, even without copying other files, there is an existing issue if --dest-dir is not the working directory. If we want to have a nicer DevX, I would add the resolved --dest-dir into the PYTHONPATH after downloading the distribution.

Edit: Looks like we are already going to do this here: flyteorg/flytekit#2673

@thomasjpfan thomasjpfan removed the untriaged This issues has not yet been looked at by the Maintainers label Aug 14, 2024
@fg91
Copy link
Member

fg91 commented Aug 14, 2024

If we can get away with having less options, I would prefer not to have a way to configure PYTHONPATH.

Ok, ok for me 👍

Note that for pure python package, then they would not need --include at all, since flyteorg/flytekit#2663 should copy over everything that was imported by their workflow.

Uh this is very nice, this means the number of use cases for --include will be drastically reduced anyways.

@davidmirror-ops
Copy link
Contributor

davidmirror-ops commented Aug 16, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants