Preparing a consistent Python environment

24 Mar 2023

Building QEMU is a complex task, split across several programs. the configure script finds the host and cross compilers that are needed to build emulators and firmware; Meson prepares the build environment for the emulators; finally, Make and Ninja actually perform the build, and in some cases they run tests as well.

In addition to compiling C code, many build steps run tools and scripts which are mostly written in the Python language. These include processing the emulator configuration, code generators for tracepoints and QAPI, extensions for the Sphinx documentation tool, and the Avocado testing framework. The Meson build system itself is written in Python, too.

Some of these tools are run through the python3 executable, while others are invoked directly as sphinx-build or meson, and this can create inconsistencies. For example, QEMU’s configure script checks for a minimum version of Python and rejects too-old interpreters. However, what would happen if code run by Sphinx used a different version?

This situation has been largely hypothetical until recently; QEMU’s Python code is already tested with a wide range of versions of the interpreter, and it would not be a huge issue if Sphinx used a different version of Python as long as both of them were supported. This will change in version 8.1 of QEMU, which will bump the minimum supported version of Python from 3.6 to 3.8. While all the distros that QEMU supports have a recent-enough interpreter, the default on RHEL8 and SLES15 is still version 3.6, and that is what all binaries in /usr/bin use unconditionally.

As of QEMU 8.0, even if configure is told to use /usr/bin/python3.8 for the build, QEMU’s custom Sphinx extensions would still run under Python 3.6. configure does separately check that Sphinx is executing with a new enough Python version, but it would be nice if there were a more generic way to prepare a consistent Python environment.

This post will explain how QEMU 8.1 will ensure that a single interpreter is used for the whole of the build process. Getting there will require some familiarity with Python packaging, so let’s start with virtual environments.

Virtual environments

It is surprisingly hard to find what Python interpreter a given script will use. You can try to parse the first line of the script, which will be something like #! /usr/bin/python3, but there is no guarantee of success. For example, on some version of Homebrew /usr/bin/meson will be a wrapper script like:

#!/bin/bash
PYTHONPATH="/usr/local/Cellar/meson/0.55.0/lib/python3.8/site-packages" \
  exec "/usr/local/Cellar/meson/0.55.0/libexec/bin/meson" "$@"

The file with the Python shebang line will be hidden somewhere in /usr/local/Cellar. Therefore, performing some kind of check on the files in /usr/bin is ruled out. QEMU needs to set up a consistent environment on its own.

If a user who is building QEMU wanted to do so, the simplest way would be to use Python virtual environments. A virtual environment takes an existing Python installation but gives it a local set of Python packages. It also has its own bin directory; place it at the beginning of your PATH and you will be able to control the Python interpreter for scripts that begin with #! /usr/bin/env python3.

Furthermore, when packages are installed into the virtual environment with pip, they always refer to the Python interpreter that was used to create the environment. Virtual environments mostly solve the consistency problem at the cost of an extra pip install step to put QEMU’s build dependencies into the environment.

Unfortunately, this extra step has a substantial downside. Even though the virtual environment can optionally refer to the base installation’s installed packages, pip will always install packages from scratch into the virtual environment. For all Linux distributions except RHEL8 and SLES15 this is unnecessary, and users would be happy to build QEMU using the versions of Meson and Sphinx included in the distribution.

Even worse, pip install will access the Python package index (PyPI) over the Internet, which is often impossible on build machines that are sealed from the outside world. Automated installation of PyPI dependencies may actually be a welcome feature, but it must also remain strictly optional.

In other words, the ideal solution would use a non-isolated virtual environment, to be able to use system packages provided by Linux distributions; but it would also ensure that scripts (sphinx-build, meson, avocado) are placed into bin just like pip install does.

Distribution packages

When it comes to packages, Python surely makes an effort to be confusing. The fundamental unit for importing code into a Python program is called a package; for example os and sys are two examples of a package. However, a program or library that is distributed on PyPI consists of many such “import packages”: that’s because while pip is usually said to be a “package installer” for Python, more precisely it installs “distribution packages”.

To add to the confusion, the term “distribution package” is often shortened to either “package” or “distribution”. And finally, the metadata of the distribution package remains available even after installation, so “distributions” include things that are already installed (and are not being distributed anywhere).

All this matters because distribution metadata will be the key to building the perfect virtual environment. If you look at the content of bin/meson in a virtual environment, after installing the package with pip, this is what you find:

#!/home/pbonzini/my-venv/bin/python3
# -*- coding: utf-8 -*-
import re
import sys
from mesonbuild.mesonmain import main
if __name__ == '__main__':
    sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])
    sys.exit(main())

This looks a lot like automatically generated code, and in fact it is; the only parts that vary are the from mesonbuild.mesonmain import main import, and the invocation of the main() function on the last line. pip creates this invocation script based on the setup.cfg file in Meson’s source code, more specifically based on the following stanza:

[options.entry_points]
console_scripts =
  meson = mesonbuild.mesonmain:main

Similar declarations exist in Sphinx, Avocado and so on, and accessing their content is easy via importlib.metadata (available in Python 3.8+):

$ python3
>>> from importlib.metadata import distribution
>>> distribution('meson').entry_points
[EntryPoint(name='meson', value='mesonbuild.mesonmain:main', group='console_scripts')]

importlib looks up the metadata in the running Python interpreter’s search path; if Meson is installed under another interpreter’s site-packages directory, it will not be found:

$ python3.8
>>> from importlib.metadata import distribution
>>> distribution('meson').entry_points
Traceback (most recent call last):
...
importlib.metadata.PackageNotFoundError: meson

So finally we have a plan! configure can build a non-isolated virtual environment, use importlib to check that the required packages exist in the base installation, and create scripts in bin that point to the right Python interpreter. Then, it can optionally use pip install to install the missing packages.

While this process includes a certain amount of specialized logic, Python provides a customizable venv module to create virtual environments. The custom steps can be performed by subclassing venv.EnvBuilder.

This will provide the same experience as QEMU 8.0, except that there will be no need for the --meson and --sphinx-build options to the configure script. The path to the Python interpreter is enough to set up all Python programs used during the build.

There is only one thing left to fix…

Nesting virtual environments

Remember how we started with a user that creates her own virtual environment before building QEMU? Well, this would not work anymore, because virtual environments cannot be nested. As soon as configure creates its own virtual environment, the packages installed by the user are not available anymore.

Fortunately, the “appearance” of a nested virtual environment is easy to emulate. Detecting whether python3 runs in a virtual environment is as easy as checking sys.prefix != sys.base_prefix; if it is, we need to retrieve the parent virtual environments site-packages directory:

>>> import sysconfig
>>> sysconfig.get_path('purelib')
'/home/pbonzini/my-venv/lib/python3.11/site-packages'

and write it to a .pth file in the lib directory of the new virtual environment. The following demo shows how a distribution package in the parent virtual environment will be available in the child as well:

A small detail is that configure’s new virtual environment should mirror the isolation setting of the parent. An isolated venv can be detected because sys.base_prefix in site.PREFIXES is false.

Conclusion

Right now, QEMU only makes a minimal attempt at ensuring consistency of the Python environment; Meson is always run using the interpreter that was passed to the configure script with --python or $PYTHON, but that’s it. Once the above technique will be implemented in QEMU 8.1, there will be no difference in the build experience, but configuration will be easier and a wider set of invalid build environments will be detected. We will merge these checks before dropping support for Python 3.6, so that users on older enterprise distributions will have a smooth transition.