Skip to content

egg_info command is very slow if there are lots of non source files in directory #450

Closed
@ghost

Description

Originally reported by: spookylukey (Bitbucket: spookylukey, GitHub: spookylukey)


The approach taken by egg_info is to list all the files in the source directory, recursively, and then apply rules. (via manifest_maker and FileList.findall)

If you have a lot of files, this can take a really long time. It combines especially badly with tools like 'tox', which put lots of virtualenvs in the current dir, putting many thousands of files there.

On one small project I maintain, running tox from a cold start has a 5 minute start up time (because it runs setup.py, which runs egg_info or something similar), which is a real pain when the tests themselves run far more quickly.

Would you consider a patch to improve this? Attempting a full fix is very difficult, due to the way that MANIFEST rules are essentially an imperative language in which order matters, as files are added and removed.

One strategy that might work would be to attempt to optimise very few MANIFEST commands - for example, only 'prune' would be considered. That is, a custom FileList.find_all command would understand the 'prune' command, and avoid recursing into that directory, but only if the prune command comes after any commands that would add files. From my understanding of how MANIFEST is processed https://docs.python.org/2/distutils/sourcedist.html, I believe this would preserve correctness.

This would not be much of an optimisation out-of-the box, but it would allow you to add "prune .tox" and "prune .git" etc. at the end of your MANIFEST.in to get big speedups.

Would this approach be accepted if I worked on it?


Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions