Skip to content

Support incremental complexity and build steps for packages #248

Open
@jaraco

Description

@jaraco

At the most recent PyPA sprint in NYC (thanks Bloomberg), I finally was able to put my finger on what was bothering me about Python Packaging. The following describes what I envision as a simpler, more intuitive system for dealing with the bi-modal nature of a package under development.

Problem

The issue stems from the disparity between a built/installed package and a package under development. Even today, pip relies on setuptools to perform a "editable" install (package under development installed into an environment). It's a necessary step to build every package, even the most basic hello-world package, a project containing one file and one function in a directory. Before the project is "built", it has no (supported) metadata--not only does it have no metadata (no name), it also doesn't appear to be installed.

To create this metadata, one needs to select a build tool (setuptools, flit, etc), author the metadata in some source format for that tool, and then run that tool to generate the metadata. If one wishes to persist that metadata, it's generally not possible or desirable to persist that metadata with the project (such as in the SCM repo), but instead the best recommendation is that the project needs to enact the various steps to publish the package and only expect to get usable metadata from that published location (often PyPI), steps that include:

  1. Author the metadata in some source format.
  2. Run build tools to translate the source format to the publishable format.
  3. Publish the built package.

Solution - Incremental and Inferred Metadata

Imagine instead a world where simple packages could author the metadata directly. Users would create files in something akin to myproject.dist-info, files that presented the static or default metadata for the project and which a build tool would copy directly and extend. To avoid mutation of source files, this metadata directory would allow for metadata be supplied by multiple files, similar to the conf.d concept in Debian (among others). Part of the metadata could include which build steps are required for the project and which ones have been run, such that a build tool could determine what build steps are required.

Many projects would have no build steps - a git checkout might produce a viable package with metadata.

Furthermore, such a system could also define some inferred values, such that useful metadata could be derived from the source code itself. Imagine for example that if a project has no "name" defined, it could infer the project name from the containing folder or the basename of the SCM URL. The version could have a sane default like 0.0.0 but also honor SCM metadata (tags) if present.

In such a world, it may not be necessary to "build" anything to have a viable (distribution) package for a project. If one creates a directory and puts a Python module in it (or even without a module, really), it will already represent a minimal package (dirname-0.0.0). From there, the developer can add modules, Python packages, and other metadata--incrementally increasing the sophistication, complexity, and build requirements for the package.

Even projects with unmet build steps would have some viable metadata... and except for builds that can't happen in-place (for whatever reason), an editable install is basically a no-op: ensure the project is on sys.path.

Execution

To support this model, several changes would need to take place. The main change would be to create a new metadata format, one that supported the model described above. It would need to be both user-friendly and machine-readable. It should probably be flexible and extensible to support unforeseen use-cases.

A model for the build steps would need to be devised. I imagine these to be arbitrary callables, but they should have constraints on what artifacts they produce and where. I imagine that the output from some build steps might be the input for subsequent build steps.

In addition to build steps, there may be install steps. The basic, implicit install step is one that copies the manifest of the project's files to a site-packages directory, but perhaps separate install steps could define console scripts that get created or copy arbitrary data to other directories based on platform rules.

Tools that read metadata (pip, importlib_metadata, maybe pkg_resources) would need to develop support for this new format.

Discussion

I believe this proposal is largely compatible with and independent of the work done on PEP 517 and 518. It would build on and eventually supersede the work done in prior metadata specs such as PEP 566 and its predecessors. It would also supersede the setup.py, setup.cfg, and possibly some of pyproject.toml as the recommended way to for a packager to supply metadata.

Perhaps run-time compatibility could be provided with an optional install step that converts the metadata into one of the older formats.

I've posted this description here to gather feedback and to serve as a location to reference the concept and proposal. Most likely a PEP would be in order to formalize and refine the proposal. I'm happy to embark on that process after gaining some tacit consent and clearing any initial hurdles/concerns.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions