Python packaging guide

Материал из ALT Linux Wiki
Python-logo-master-v3-TM-flattened.png

Python packaging guide

This guide is intended to cover the packaging of Python projects: source trees having legacy setup.py or modern pyproject.toml (or both) files.

A packaging of Python distribution implies two mandatory steps:

  • build - build some artifact which can be either bare source files or binary distribution also known as wheel
  • install - installation of produced build artifact

There are optional steps as well:

  • check


Build

Python projects can be one of two types:

  • pure Python distributions which don't require any actual building process and can be just copied/pasted onto destination as is
  • platform-dependent distributions which require special building process and can't be just copied/pasted onto destination as is

Sources

There are two supported options for sources:

  • source tree (snapshot of VCS)
  • source distribution also known as sdist
VCS and gear

A source tree produced by gear is not a valid VCS checkout due to missing SCM's internal data (e.g. .git directory and its content).
Some packaging plugins may rely on SCM. For example,

setuptools_scm implements a file_finders entry point which returns all files tracked by your SCM. This eliminates the need for a manually constructed MANIFEST.in in most cases where this would be required when not using setuptools_scm, namely:

  • To ensure all relevant files are packaged when running the sdist command.
  • When using include_package_data to include package data as part of the build or bdist_wheel.

Thus, actual SCM source tree is required in this case, otherwise some expected data (everything, besides Python packages or modules) may not be packaged.
For example, basic git configuration maybe done in RPM specfile with:

if [ ! -d .git ]; then
    git init                                                                         
    git config user.email author@example.com                                         
    git config user.name author                                                      
    git add .                                                                        
    git commit -m 'release'                                                          
    git tag '%version'
fi
pypi.org

Though it's possible to use wheels from pypi.org as source for installation (skipping build step), in general, it's bad idea to just unpack such a wheel for many reasons like:

  • wheels may be platform-specific
  • wheels are prebuilt distributions, builders of which may not follow distro's build rules and policies, can link to whatever they want and so on
  • wheels are usually stripped (there are no tests, docs, etc.)

Thus, only source distribution can be used from pypi.org for packaging, though it's recommended to use a source of sdist e.g., a code fetched from code hostings.

Build roles

With accepted PEP517 a build process is more standardized and is split into two responsibilities:

  • build frontend takes arbitrary source tree and calls build backend on them. User can use one frontend for building any PEP517-aware project.
  • build backend actually builds sdist or wheel. Backend is per-project defined entity.

Build dependencies

With accepted PEP518 the format of static build dependencies is standardized and can be configured in TOML via pyproject.toml file.

For the vast majority of projects the PEP517/518 configuration will be:

[build-system]
requires = ["setuptools", "wheel"]
build-backend = "setuptools.build_meta"
  • distro packager is responsible for specifying these requirements in RPM specfile. Though these dependencies may not be enough since a backend may have dynamic ones. See get-requires-for-build-wheel for details
  • distro packager may add any other required build dependency but must follow "necessary and sufficient" rule
  • presence of %pyproject_build or %pyproject_install in RPM specfile automatically adds build dependency on pyproject-installer. But if that distribution is used for something different from aforementioned macros the corresponding requirement should be set explicitly
  • presence of %tox_check or %tox_check_pyproject in RPM specfile automatically adds build dependency on tox and its required plugins. But if that distribution is used for something different from aforementioned macros the corresponding requirement should be set explicitly
  • dependencies required for check or build docs must be RPM-conditional i.e., they can be opted out by RPM means

Build with RPM

rpm-build-python3 provides RPM macro %pyproject_build as a CLI interface for build frontend (See pyproject-installer for details).

  • build should be done with the help of this macro unless otherwise is absolutely necessary
Implementation details
  • built wheel will be placed into {source directory}/dist/ directory


Install

A wheel is a regular zip archive having special name and suffix ".whl". Thus, an installer mostly just verifies content of a wheel, unpacks it and lays out unpacked files if necessary.

Install with RPM

rpm-build-python3 provides RPM macro %pyproject_install as a CLI interface for installer (See pyproject-installer for details).

  • installation should be done with the help of this macro unless otherwise is absolutely necessary
Implementation details
  • only the latest wheel built with %pyproject_build will be installed by %pyproject_install

Bytecompilation

PEP427 says that wheel installers should compile any installed .py to .pyc (see compiled-python-files and pyc directories for details). But rpm-build-python3 already does this by default with the help of python3.compileall.py script which compiles Python modules with the given optimization level (disabled, -O switch removes assert statements, the -OO switch removes both assert statements and __doc__ strings).


Check

Testing is vital in Python eco-system. Never skip or disable tests unless it's absolutely necessary.

Due to security reasons the hasher in ALT's build-farm is network-isolated environment and thus, testing in such an env can be very tricky or is not possible at all. Though the most of unit tests require nothing for run and check should be enabled by default. In contrast, integration tests usually require Internet's availability and thereby, should be skipped. Nowadays the vast majority of projects use pytest or stdlib's unittest as tests' runner and tox to automate testing.

Check with RPM

It's often required to run project's tests during downstream packaging in the global isolated environment. For example, the user of such environment is unprivileged (can't install distributions into global sitepackages) and has no access to Internet (can't install distributions from package indexes). On the other hand built Python distributions should not be unintentionally installed into user sitepackages to avoid their interference with any further distributions being tested. Though some of tests can be run in current Python environment without any change (e.g. flat layout and pure Python package), some may require setting of PYTHONPATH environment variable (e.g. src layout and pure Python package). But things may be more complex in case of arch-dependent Python packages, .pth hooks or entry_points plugins where PYTHONPATH way can't help. This is one of the reasons why venv(stdlib) or virtualenv(third-party) exists. There is really nice tool tox for automation of testing process. But it's overkill to use it for the aforementioned task, for example:

  • by default tox download and install dependencies of test env and dependencies of package (though this can be overcome with options and external plugins)
  • tox has many runtime dependencies
Check with pyproject_installer

The recommended way to run tests is one of RPM macros based on pyproject_installer's command run:

  • %pyproject_run (i.e. python3 -m pyproject_installer run) allows execution of arbitrary command within non-isolated venv-based Python virtual environment:
    • create minimal virtual environment with the help of stdlib's venv that:
      • has no installed pip, setuptools
      • has access to system and user site packages
    • generate console scripts of system and user site packages to access them within venv
    • install built distribution into venv without any dependencies
    • spawn command in subprocess within venv preserving current environment variables
    • set NO_INTERNET environment variable that can be used by tests' runner for making tests requring Internet conditional
  • %pyproject_run_unittest (i.e. python3 -m pyproject_installer run -- python3 -m unittest) and %pyproject_run_pytest (i.e. python3 -m pyproject_installer run -- python3 -m pytest) are the most commonly used test commands (unittest and pytest respectively)
Check with tox

rpm-build-python3 provides several RPM macros as a CLI interface for tox:

  • %tox_check:
    • set required tox's and pip's env variables for running in network-isolated env
    • pass down NO_INTERNET env variable to tox env. This variable can be used by tests' runner for making tests requring Internet conditional
    • run tox with system site distributions
    • relax all tox env's dependencies
    • generate console scripts for every system site distribution defining them
  • %tox_check_pyproject:
    • same as %tox_check but tox will use a wheel built by %pyproject_build
  • %tox_create_default_config:
    • creates a default tox configuration (tox.ini) in the current working directory with the following content:
      [testenv]
      commands =
          pytest {posargs:-vra}
      
      This maybe helpful if you want to run tests under tox and Pytest as tests runner but upstream doesn't provide such a config or you want to override it with your own.
      Override note: today's tox searches its config in the following order: pyproject.toml, tox.ini, setup.cfg (tox-configuration-discovery).

Packaging

Subpackages

  • if docs or tests are required by something or somebody else they must be subpackaged into their own RPM packages, otherwise they must not be shipped.

Metadata

If a project is assumed to be reusable it must publish corresponding meta information. The generation of such data is a job of build backend. But there are projects which are intended only for internal usage and they don't want to produce any meta information in this case (usually such projects are not published on index like pypi.org).

  • installed metadata (content of dist-info directory) shouldn't be stripped any more than it's already done. Nowadays, only METADATA and entry_points.txt are actually used by metadata parsers (See importlib.metadata for details). This data is crucial for many aspects of Python eco-system.
  • %pyproject_distinfo RPM macro can help if more finer-grained control over a packaged metadata is desired or required. This macro normalizes a given name according to PEP427 rules. Note: not everyone build backend follows these rules and can produce wheel having filename with . (dot character) and/or upper case characters in distribution name's part of that filename. See https://discuss.python.org/t/amending-pep-427-and-pep-625-on-package-normalization-rules/17226 for details. it's not always possible to predict produced wheel filename.

Migration to PEP517 in RPM specfile

More and more Python projects are migrating from legacy setup.py-packaging and completely removing this file from their tree. This makes it impossible to package such projects with old Python RPM macros which unconditionally rely on existent setup.py.

New RPM macros

In most of the cases the legacy RPM macros for build or install Python project can be (and must be) replaced with the modern ones e.g.:

legacy macro modern macro
python3_build pyproject_build
python3_build_debug pyproject_build
python3_install pyproject_install
python3_build_install pyproject_build and pyproject_install

New build dependencies

As it was said before a distro packager is responsible for specifying build requirements in RPM specfile.


This can be done manually or automatically. Current section describes only manual way to achieve it.


All the static requirements for building a project are given in pyproject.toml.

For example,

[build-system]
requires = ["setuptools"]
build-backend = "setuptools.build_meta"

setuptools in turn requires wheel distribution to build a wheel. Thereby, the following build dependencies must be specified in RPM specfile explicitly:

BuildRequires: python3-module-setuptools
BuildRequires: python3-module-wheel
Примечание: Building sdist or wheel may require additional distributions, names of which can only be obtained from a build backend.


Legacy setup.py-formatted projects

Even if a project still uses legacy packaging format it's possible to build such a project with PEP517/518-aware RPM macros.

The following default configuration

[build-system]
requires = ["setuptools", "wheel"]
build-backend = "setuptools.build_meta:__legacy__"

will be used if:

  • pyproject.toml file is missing
  • 'build-system' table is missing


Additionally, the legacy setuptools.build_meta:__legacy__ backend will be used if:

  • 'build-backend' key of 'build-system' table is missing


Примечание: Those default build dependencies must be specified in RPM specfile explicitly:
BuildRequires: python3-module-setuptools
BuildRequires: python3-module-wheel


Installed metadata

The new format of metadata (dist-info will be generated and packaged on migrating to PEP517-aware RPM macros.
For example, egg-info (previous format) vs dist-info:

-  my_distribution-1.0-py3.10.egg-info
-  my_distribution-1.0-py3.10.egg-info/dependency_links.txt
-  my_distribution-1.0-py3.10.egg-info/entry_points.txt
-  my_distribution-1.0-py3.10.egg-info/not-zip-safe
-  my_distribution-1.0-py3.10.egg-info/PKG-INFO
-  my_distribution-1.0-py3.10.egg-info/requires.txt
-  my_distribution-1.0-py3.10.egg-info/SOURCES.txt
-  my_distribution-1.0-py3.10.egg-info/top_level.txt
+  my_distribution-1.0.dist-info
+  my_distribution-1.0.dist-info/entry_points.txt
+  my_distribution-1.0.dist-info/METADATA

setuptools

setuptools is the mandatory dependency of legacy RPM macros, but it's an optional build backend for the new ones. Thus, setuptools is no longer pulled into a build environment just on usage of modern macros and should be specified explicitly if it's still needed.

Related