Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coverage measurement of Cython code in meson-based projects #1837

Open
oscarbenjamin opened this issue Aug 25, 2024 · 0 comments
Open

Coverage measurement of Cython code in meson-based projects #1837

oscarbenjamin opened this issue Aug 25, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@oscarbenjamin
Copy link

oscarbenjamin commented Aug 25, 2024

I apologise after having written this that it is long. I love coverage and I use it a lot so many thanks!

I have been having issues using coverage with meson and Cython though and have spent a long time wrestling with the plugin interface. I am writing this partly because I think that there are some things in coverage that can be improved but also partly to see if there can be any advice for how to handle these issues better...

Since PEP 517 a number of Python projects with C etc code have switched from using setuptools as a build system to using either meson (meson-python) or cmake (scikit-build). For example numpy, scipy and python-flint (which I work on) all now use meson as their build system and many projects also spin as the development front-end so that developers can do e.g. spin test --coverage to run tests with coverage measurement which will under the hood invoke meson to build the project and ultimately call pytest --cov=... using pytest-cov and coverage.py.

Coverage measurement of Cython code does not work in meson or cmake-based projects because a number of tools (cython, coverage.py, cython's coverage plugin, pytest, ...) make assumptions based on the current working directory that only really hold somewhat inconsistently for setuptools-based projects. Some of the problems relate to src-layout i.e. having the code under a directory called src rather than directly in project root. Other problems relate to the CWD not being the project root at either built time or when running the tests or when reporting coverage. Other problems relate to the files having been moved or copied at trace/report time e.g. when measuring coverage after the code has been installed to site-packages or otherwise placed on sys.path somewhere away from the CWD.

Similar issues can also arise for setuptools projects and the reason that Cython's coverage plugin is such a mess is because of the heuristics that have been added to try to make it sort of work usually:
https://github.com/cython/cython/blob/1d7100d867a8b64b6f661009ec2ed45a4d8d5dba/Cython/Coverage.py#L68-L108
https://github.com/cython/cython/blob/1d7100d867a8b64b6f661009ec2ed45a4d8d5dba/Cython/Coverage.py#L186-L251
I would like to come up with something much cleaner and more robust but I think that there are also some limitations imposed by coveragepy here.

Previous issues and discussions in spin and cython can be found at scientific-python/spin#182, cython/cython#6186. I have created a simple demo repo that shows what it looks like to have a meson-based project that uses Cython code and tries to measure coverage: https://github.com/oscarbenjamin/cython_coverage_demo

I recently made a change in cython (cython/cython#6341) which makes it possible to get a Cython coverage plugin working for a meson-based project and I have implemented the plugin in flintlib/python-flint#188. I want to upstream the plugin to Cython but there are a number of problems with the way that it works and really I think that the "proper" way to make this work requires at least some changes in coverage.py which I would like to discuss here.

I need to refer to a number of different directories and types of path to explain this so I'm going to define some terms first using a hypothetical project called proj. The project is checked out in git at /home/oscar/project which is the "project root directory". Inside project root the file layout looks approximately like this (simplified for demonstration):

$ tree -r .
.
├── src
│   └── pkg
│       ├── mod.pyx
│       └── __init_.py
├── pyproject.toml
├── meson.build
├── build-install
│   └── site-packages
│       └── pkg
│           ├── mod.cpython3.12.so
│           └── __init_.py
└── build
    └── src
        ├── pkg_mod.pyx.c
        └── pkg_mod.cpython3.12.so

The project provides an extension module pkg.mod which is generated from the source file at src/pkg/mod.pyx relative to project root. We say that src/pkg/mod.pyx is the "root-relative" path for the source file. Often commands like coverage etc are run with the CWD as the root directory in which case we can also say that the CWD-relative path is src/pkg/mod.pyx. The "import-relative" path for the source file is pkg/mod.pyx. This is relative to the src directory and also corresponds to the imported module name pkg.mod.

Not all projects have a src directory and it is common to run commands with CWD at project root which leads to much confusion between three different things that often coincide but that should really be kept as firmly distinct concepts:

  • The root-relative path to the source file.
  • The import-relative path to the source file.
  • The CWD-relative path to the source file.

The confusion comes from the fact that if we don't have a src-layout and we never change the working directory and we build and run everything in-tree then there is an equivalence that root-relative, CWD-relative and import-relative paths all coincide.

It is usually better to identify the source file with its "import-relative" path pkg/mod.pyx. The import-relative path makes most sense given that the files/modules can be moved around and can appear in different places at different times or simultaneously. The import-relative path also does not depend on CWD. The one thing that is consistent is that when we actually run the code the files are somewhere relative to a directory in sys.path and that relative path is the import-relative path regardless of where that is in the file system and what CWD is.

I think that the correct way to identify our source files in coverage tracing is by knowing that our project builds a package called pkg which will be in sys.path somewhere. The files under pkg have import-relative paths that can always be computed at runtime even if the runtime gives us an absolute path because we know the module name and the sys.path entries. Tools that will do this need to be told that a src layout is being used, that we are building a package called pkg and that import-relative paths for modules/files under pkg should be identified as being relative to the src directory when we want to identify them with the original source files in the codebase. In other words the following should be configured for coverage or for Cython's coverage plugin somehow:

[tool.coverage]
source_dir = 'src'
packages = ['pkg']

That is sufficient information to identify which of the paths being traced correspond to the source files in proj and also to identify their locations under the src directory which is itself a path relative to project root which is where this configuration file (pyproject.toml) would live. It is important though that we should require this information to be provided explicitly rather than attempting to infer it. We can allow a default value for source_dir to be the project root (i.e. . relative to project root).

None of this needs to depend on CWD apart from the default location of the pyproject.toml but it should also be possible to provide that as an absolute path e.g. --rcfile=/home/oscar/proj/pyproject.toml making the runtime CWD irrelevant to the configuration/tracing/reporting process.

All of this gets wrapped up by the spin frontend but what happens in the build and test cycle for a meson-based project with the layout shown above is that we first run:

$ meson setup build --various-config-arguments

This initialises the build directory and sets our build configuration. Then we build with

$ meson compile -C build

This builds all of the intermediate and end-product files that are needed for an install in the build directory. Then we "install" this into the build-install directory with:

$ meson install --only-changed -C build --destdir ../build-install

This generates the importable package inside build-install/site-packages where the built .so files and the .py files are combined into their installed layout.

Then when we run the tests there are many different ways of doing it. One way is:

$ export PYTHONPATH="/home/oscar/proj/build-install/site-packages"
$ pytest --pyargs pkg --cov=pkg

Another way is to actually change the working directory like:

$ cd build-install/site-packages
$ pytest --pyargs pkg --cov=pkg

We could also run coverage directly like:

$ export PYTHONPATH="/home/oscar/proj/build-install/site-packages"
$ coverage run test/tests.py

And of course there are many other ways of running the tests like by building and installing a wheel into a temporary venv with tools like nox etc.

In any case there are lots of different ways of doing this but the point is that various assumptions don't hold:

  1. The CWD is not necessarily the project root either during the build steps or when running the tests or when reporting coverage.
  2. The CWD for running the tests may not be the same as the CWD at coverage report time.
  3. The files used at runtime may be copies of the source files that appear in otherwise ephemeral directories or may even have been installed to actual site-packages but we want to report coverage on the actual source files located in the code base (in the src directory).

The way that the Cython coverage plugin works is that firstly the Cython-generated C code has comments in it like:

    /* "flint/types/nmod.pyx":150
 *         cdef mp_limb_t val
 *         if any_as_nmod(&val, t, (<nmod>s).mod):
 *             r = nmod.__new__(nmod)             # <<<<<<<<<<<<<<
 *             r.mod = (<nmod>s).mod
 *             r.val = nmod_mul(val, (<nmod>s).val, r.mod)
*/
    __Pyx_TraceLine(150,20,0,__PYX_ERR(0, 150, __pyx_L1_error))

At runtime the __Pyx_TraceLine call gives the trace event but the comment "flint/types/nmod.pyx":150 maps this event to a line of the source file. The coverage plugin receives file_tracer calls for paths like flint/types/nmod.pyx and then searches around for the associated nmod.c file. It then parses the C code and extracts all of these comments to build a line-map data structure. Then the FileTracer.dynamic_source_filename method receives a frame object, extracts the path from frame.f_code.co_filename and returns the actual path of the source file. Lastly the FileReporter receives those paths at report time, searches again for the .c files and this time uses the line-map data structure to return the executable lines in its lines() method.

The way that this goes wrong specifically with cython and meson is as follows:

Firstly the meson build command runs ninja in the build directory like:

$ cd build && ninja build

Then ninja runs Cython like:

$ cython ../src/pkg/mod.pyx -o src/pkg_mod.pyx.c

Already now it goes wrong because cython inserts the wrong paths into the .c file when the .pyx file is not under CWD. The .c file ends up having a mixture of absolute paths like /home/oscar/proj/src/pkg/mod.pyx and just basenames like mod.pyx. This is bad because the .c files are supposed to be distributable so absolute paths should not be used. Also obviously just using basename is bad because there can be multiple files with the same name and we don't know where in the tree to find them anyway.

I changed this in cython/cython#6341 so that now when Cython (master branch) is run on a file not under CWD it will consistently always output the import-relative path pkg/mod.pyx with POSIX slashes. Still when the file is under CWD a mixture of CWD-relative and import-relative paths will be produced (I didn't change that because I expect that the current coverage plugin relies on it). For a meson/cmake-based project this means that we now know that trace events from cython-generated C code will reliably produce import-relative paths for the original source files regardless of where the extension module actually is at runtime.

The next step is to receive those paths back from coverage.py in the plugin's file_tracer method. The problem at this point is that coverage.py messes them up. I have carefully ensured that an import-relative path pkg/mod.pyx is in the trace event but coverage tries to turn it into an absolute path so what comes through to file_tracer is the non-existent path:

/home/oscar/proj/pkg/mod.pyx

My plugin works around this by removing the root prefix to get the import-relative path and then joining that with the absolute path of the src directory at report time:
https://github.com/flintlib/python-flint/blob/03a4bafb4d6d112225c29cad352096b7854bc8f7/coverage_plugin.py#L106-L124
Then we get back the proper absolute path:

/home/oscar/proj/src/pkg/mod.pyx

While it is possible to work around this I don't think that this is the right way: coverage.py should not generate the broken path in the first place. The problem here is that coverage.py is treating the import-relative path as a CWD-relative path and using that incorrect assumption when trying to make an absolute path. (Note that my plugin actually gets this wrong and strips off the root directory but checking now I see that it should strip off CWD.)

There should be some way to prevent coverage.py from doing that so that if I am in control of the paths generated in the trace events then I also know what paths my plugin will receive and the CWD should not play any part in this because CWD can be anything. I think that by default coverage.py should pass the unadulterated import-relative paths through or if not that then at least it should be possible to configure this in the plugin like:

class Plugin:
    mangle_paths = False

If the plugin can reliably assume that it is going to get import-relative paths then all it needs to do is filter them by package name, and then separate the .pyx and .py files. I don't know if this is possible but it would be much nicer in the file_tracer method if we could also get the module name like pkg.mod regardless of whether the paths are absolute or relative because the most likely first step to filter this is by checking for the pkg package. In fact if it is possible to get the module name then the ideal thing would be to set that in the plugin like:

class Plugin:
    packages = ['pkg']

Then coverage.py can do that filtering before calling Plugin.file_tracer.

All the same we can implement the file_tracer method and we can work around the broken path problem but what I have ended up with is this:

$ meson setup build -Dcoverage=true
$ meson compile -C build
$ meson install --only-changed -C build --destdir ../build-install
$ export PYTHONPATH="/home/oscar/current/active/python-flint/build-install/usr/lib/python3.12/site-packages"
$ coverage run -m flint.test
$ coverage report
Name                                                                             Stmts   Miss  Cover
----------------------------------------------------------------------------------------------------
build-install/usr/lib/python3.12/site-packages/flint/__init__.py                    37      0   100%
build-install/usr/lib/python3.12/site-packages/flint/flint_base/__init__.py          0      0   100%
build-install/usr/lib/python3.12/site-packages/flint/functions/__init__.py           0      0   100%
build-install/usr/lib/python3.12/site-packages/flint/test/__init__.py                0      0   100%
build-install/usr/lib/python3.12/site-packages/flint/test/__main__.py               86     36    58%
build-install/usr/lib/python3.12/site-packages/flint/test/test_all.py             3348   1282    62%
build-install/usr/lib/python3.12/site-packages/flint/types/__init__.py               0      0   100%
build-install/usr/lib/python3.12/site-packages/flint/utils/__init__.py               0      0   100%
build-install/usr/lib/python3.12/site-packages/flint/utils/flint_exceptions.py       6      0   100%
flint/flint_base/flint_base.pyx                                                    458     58    87%
flint/flint_base/flint_context.pxd                                                   8      4    50%
flint/flint_base/flint_context.pyx                                                  29      1    97%
flint/functions/showgood.pyx                                                        62      8    87%
flint/pyflint.pyx                                                                    3      0   100%
flint/types/acb.pyx                                                               1035    136    87%
flint/types/acb_mat.pyx                                                            391    106    73%
...

The first problem here is that while the coverage is actually being reported correctly for the Cython (.pyx and .pxd) files it is not being reported correctly for the .py files. In particular this shows test_all.py as having 62% coverage which is definitely not true. That file is literally the test code and the tests do run!

My plugin does not do anything with the .py files but the problem only appears when my plugin is in use. Somehow it prevents coverage.py from reporting the coverage of .py files correctly.

This snippet from coverage html shows clearly that it is bogus because it alternates between covered and uncovered statements in the same test function:

image

I assume that coverage is getting confused about the paths in this situation somehow when handling the trace events. Note that I am not doing anything funny with CWD here: I just use PYTHONPATH to make the "installed" package available and then run

$ coverage run -m flint.test

which is a module that imports all the test functions and calls them (no subprocesses or other funny business...):
https://github.com/flintlib/python-flint/blob/master/src/flint/test/__main__.py

I can also try doing it like this:

$ export PYTHONPATH=$(pwd)  # to find my coverage_plugin.py
$ cd build-install/usr/lib/python3.12/site-packages/
$ coverage run --rcfile=../../../../../.coveragerc -m flint.test -t
$ coverage --rcfile=../../../../../.coveragerc report

I still get an incomplete coverage for test_all.py this way though. Instead of using coverage directly here I can use pytest-cov:

$ export COVERAGE_RCFILE=$(pwd)/.coveragerc
$ export PYTHONPATH=$(pwd)
$ cd build-install/usr/lib/python3.12/site-packages/
$ pytest flint/test/ --cov=flint
======================================== test session starts ========================================
platform linux -- Python 3.12.0, pytest-7.4.3, pluggy-1.3.0
rootdir: /home/oscar/current/active/python-flint
plugins: doctestplus-1.2.1, cov-4.1.0, hypothesis-6.88.4, xdist-3.4.0
collected 54 items                                                                                  

flint/test/test_all.py ......................................................                 [100%]

---------- coverage: platform linux, python 3.12.0-final-0 -----------
Name                              Stmts   Miss  Cover
-----------------------------------------------------
flint/__init__.py                    37      0   100%
flint/flint_base/__init__.py          0      0   100%
flint/functions/__init__.py           0      0   100%
flint/test/__init__.py                0      0   100%
flint/test/__main__.py               86     86     0%
flint/test/test_all.py             3348   1285    62%
flint/types/__init__.py               0      0   100%
flint/utils/__init__.py               0      0   100%
flint/utils/flint_exceptions.py       6      0   100%
-----------------------------------------------------
TOTAL                              3477   1371    61%


======================================== 54 passed in 7.23s =========================================

Again incomplete coverage of test_all.py. This time all the .pyx files are missing as well. I told pytest-cov I wanted to track the -m flint module but I assume that pytest-cov is filtering them out somehow because their "path" is not under CWD. Again using absolute paths and CWD is not the right way to think about this: rather we should think of import-relative paths and -m flint should just mean all source files contributing to the flint package regardless of where in the filesystem those files are and what the CWD is.

The second problem with the coverage report output is the way that the Cython files are shown as import-relative paths but the Python files are shown as root-relative paths to the temporary files in the build directory. I'm not sure what the ideal output here should be but it is one of:

  • All paths are root-relative paths to the files under src.
  • All paths are import-relative paths.

Aesthetically I prefer import-relative paths here but I assume that root-relative paths should be preferred if we want e.g. codecov to parse the .coverage dump (maybe it doesn't matter?). Either way we don't want paths to the temporary files under build-install and we do want the different types of files to have the same type of path.

I thought that I could redirect the .py paths in the Plugin but it doesn't seem to be possible without taking responsibility for tracing and reporting them. That is what Cython's coverage plugin does but ideally it would leave the .py files alone for coverage.py to handle (sometimes .py files get cythonised which is a special case here but I mean non-cythonised .py file...). What I wanted to do is something like this:

from coverage.somewhere import CoveragePythonTracer

class Plugin:
    def file_tracer(self, filename):
        if filename is in build-install:
            actual_filename = 'src' / import_relative_path(filename)
            return CoveragePythonTracer(actual_filename)

I can't immediately see something like that available though.

Really though a plugin should not be needed for this. Is there a way to tell coveragepy that the .py files under build-install are copies of the actual source code in the src directory?

This is what I meant above with:

[tool.coverage]
source_dir = 'src'
packages = ['pkg']

There should be some way to tell coverage.py that the package pkg is what we are measuring and that its source files are in src and each of its modules from sys.path defines an import-relative path that for the purpose of reporting identifies a path relative to src.

Maybe there already is a way to do this?

@oscarbenjamin oscarbenjamin added the enhancement New feature or request label Aug 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant