Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Docs] Clarify usage of include_package_data/package_data/exclude_package_data on package data files #4643

Merged
merged 21 commits into from
Sep 26, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 14 additions & 6 deletions docs/userguide/datafiles.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,20 @@
Data Files Support
====================

Old packaging installation methods in the Python ecosystem have
traditionally allowed the inclusion of "data files" (files beyond
:ref:`the default set <manifest>` ), which are placed in a platform-specific
location. However, the most common use case for data files distributed
with a package is for use *by* the package, usually by including the
data files **inside the package directory**.
In the Python ecosystem, the term "data files" is used in various complex scenarios
and can have nuanced meanings.
For the purposes of this documentation, we define "data files" as non-Python files
that are installed alongside Python modules and packages on the user's machine
when they install a :term:`distribution <Distribution Package>` from PyPI
or via a ``.whl`` file.
Copy link
Contributor Author

@DanielYang59 DanielYang59 Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would tag and explain something I want to change (I find myself lose track of what I was originally intended if I overthink), please free feel to comment.

I believe the source (PyPI vs file) is not important here, but whether we're installing from a binary distribution (.whl) or source distribution that matters.

Copy link
Contributor

@abravalheri abravalheri Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are probably better off if we don't define data files in terms of source distributions or source trees... Because then we enter in muddy waters (as source distributions and source tree may have any kinds of files).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are probably better off if we don't define data files in terms of source distributions or source trees

Sorry it looks like I might have caused confusion here. I was suggesting:

- When they install a distribution from PyPI or via a `.whl` file
+ When they install a distribution from source distribution or via a `.whl` file

Because install from PyPI is not important here, because by install from PyPI, we could install both from .whl file (if wheel is built for that particular platform) or source distribution. I hope I didn't misunderstand anything?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the former phrase is better for the definition than the second. Alternatively only the following is also fine:

- When they install a distribution from source distribution or via a `.whl` file
+ when they install a distribution via a `.whl` file

I believe We should not define define data files as per source distribution. A Python package always need to be transformed into a wheel, so the files inside of the wheel is what actually matters.

In the previous version install from PyPI is being used as an equivalent of pip install package, which in the end of the day downloads (preferentially) a .whl file.

These files are typically intended for use at runtime by the package itself or
to influence the behavior of other packages or systems.
They may also be referred to as "resource files."
Old packaging installation methods in the Python ecosystem
have traditionally allowed installation of "data files", which
are placed in a platform-specific location. However, the most common use case
for data files distributed with a package is for use *by* the package, usually
by including the data files **inside the package directory**.

Setuptools focuses on this most common type of data files and offers three ways
of specifying which files should be included in your packages, as described in
Expand Down