
Description
Originally reported by: spookylukey (Bitbucket: spookylukey, GitHub: spookylukey)
The approach taken by egg_info is to list all the files in the source directory, recursively, and then apply rules. (via manifest_maker
and FileList.findall
)
If you have a lot of files, this can take a really long time. It combines especially badly with tools like 'tox', which put lots of virtualenvs in the current dir, putting many thousands of files there.
On one small project I maintain, running tox from a cold start has a 5 minute start up time (because it runs setup.py, which runs egg_info or something similar), which is a real pain when the tests themselves run far more quickly.
Would you consider a patch to improve this? Attempting a full fix is very difficult, due to the way that MANIFEST rules are essentially an imperative language in which order matters, as files are added and removed.
One strategy that might work would be to attempt to optimise very few MANIFEST commands - for example, only 'prune' would be considered. That is, a custom FileList.find_all
command would understand the 'prune' command, and avoid recursing into that directory, but only if the prune command comes after any commands that would add files. From my understanding of how MANIFEST is processed https://docs.python.org/2/distutils/sourcedist.html, I believe this would preserve correctness.
This would not be much of an optimisation out-of-the box, but it would allow you to add "prune .tox" and "prune .git" etc. at the end of your MANIFEST.in to get big speedups.
Would this approach be accepted if I worked on it?