Coverage measurement of Cython code in meson-based projects

I apologise after having written this that it is long. I love coverage and I use it a lot so many thanks!

I have been having issues using coverage with meson and Cython though and have spent a long time wrestling with the plugin interface. I am writing this partly because I think that there are some things in coverage that can be improved but also partly to see if there can be any advice for how to handle these issues better...

Since PEP 517 a number of Python projects with C etc code have switched from using setuptools as a build system to using either meson (meson-python) or cmake (scikit-build). For example numpy, scipy and python-flint (which I work on) all now use meson as their build system and many projects also `spin` as the development front-end so that developers can do e.g. `spin test --coverage` to run tests with coverage measurement which will under the hood invoke meson to build the project and ultimately call `pytest --cov=...` using `pytest-cov` and `coverage.py`.

Coverage measurement of Cython code does not work in meson or cmake-based projects because a number of tools (cython, coverage.py, cython's coverage plugin, pytest, ...) make assumptions based on the current working directory that only really hold somewhat inconsistently for setuptools-based projects. Some of the problems relate to src-layout i.e. having the code under a directory called `src` rather than directly in project root. Other problems relate to the CWD not being the project root at either built time or when running the tests or when reporting coverage. Other problems relate to the files having been moved or copied at trace/report time e.g. when measuring coverage after the code has been installed to site-packages or otherwise placed on sys.path somewhere away from the CWD.

Similar issues can also arise for setuptools projects and the reason that Cython's coverage plugin is such a mess is because of the heuristics that have been added to try to make it sort of work usually:
https://github.com/cython/cython/blob/1d7100d867a8b64b6f661009ec2ed45a4d8d5dba/Cython/Coverage.py#L68-L108
https://github.com/cython/cython/blob/1d7100d867a8b64b6f661009ec2ed45a4d8d5dba/Cython/Coverage.py#L186-L251
I would like to come up with something much cleaner and more robust but I think that there are also some limitations imposed by coveragepy here.

Previous issues and discussions in spin and cython can be found at https://github.com/scientific-python/spin/issues/182, https://github.com/cython/cython/issues/6186. I have created a simple demo repo that shows what it looks like to have a meson-based project that uses Cython code and tries to measure coverage: https://github.com/oscarbenjamin/cython_coverage_demo

 I recently made a change in cython (https://github.com/cython/cython/pull/6341) which makes it possible to get a Cython coverage plugin working for a meson-based project and I have implemented the plugin in https://github.com/flintlib/python-flint/pull/188. I want to upstream the plugin to Cython but there are a number of problems with the way that it works and really I think that the "proper" way to make this work requires at least some changes in coverage.py which I would like to discuss here.

I need to refer to a number of different directories and types of path to explain this so I'm going to define some terms first using a hypothetical project called `proj`. The project is checked out in git at `/home/oscar/project` which is the "project root directory". Inside project root the file layout looks approximately like this (simplified for demonstration):
```console
$ tree -r .
.
├── src
│   └── pkg
│       ├── mod.pyx
│       └── __init_.py
├── pyproject.toml
├── meson.build
├── build-install
│   └── site-packages
│       └── pkg
│           ├── mod.cpython3.12.so
│           └── __init_.py
└── build
    └── src
        ├── pkg_mod.pyx.c
        └── pkg_mod.cpython3.12.so
```
The project provides an extension module `pkg.mod` which is generated from the source file at `src/pkg/mod.pyx` relative to project root. We say that `src/pkg/mod.pyx` is the "root-relative" path for the source file. Often commands like `coverage` etc are run with the CWD as the root directory in which case we can also say that the CWD-relative path is `src/pkg/mod.pyx`. The "import-relative" path for the source file is `pkg/mod.pyx`. This is relative to the `src` directory and also corresponds to the imported module name `pkg.mod`.

Not all projects have a `src` directory and it is common to run commands with CWD at project root which leads to much confusion between three different things that often coincide  but that should really be kept as firmly distinct concepts:

- The root-relative path to the source file.
- The import-relative path to the source file.
- The CWD-relative path to the source file.

The confusion comes from the fact that *if* we don't have a src-layout *and* we never change the working directory *and* we build and run everything in-tree then there is an equivalence that root-relative, CWD-relative and import-relative paths all coincide.

It is usually better to identify the source file with its "import-relative" path `pkg/mod.pyx`. The import-relative path makes most sense given that the files/modules can be moved around and can appear in different places at different times or simultaneously. The import-relative path also does not depend on CWD. The one thing that is consistent is that when we actually run the code the files are somewhere relative to a directory in `sys.path` and that relative path is the import-relative path regardless of where that is in the file system and what CWD is.

I think that the correct way to identify our source files in coverage tracing is by knowing that our project builds a package called `pkg` which will be in `sys.path` somewhere. The files under `pkg` have import-relative paths that can always be computed at runtime even if the runtime gives us an absolute path because we know the module name and the sys.path entries. Tools that will do this need to be told that a `src` layout is being used, that we are building a package called `pkg` and that import-relative paths for modules/files under `pkg` should be identified as being relative to the `src` directory when we want to identify them with the original source files in the codebase. In other words the following should be configured for coverage or for Cython's coverage plugin somehow:
```toml
[tool.coverage]
source_dir = 'src'
packages = ['pkg']
```
That is sufficient information to identify which of the paths being traced correspond to the source files in `proj` and also to identify their locations under the `src` directory which is itself a path relative to project root which is where this configuration file (`pyproject.toml`) would live. It is important though that we should require this information to be provided *explicitly* rather than attempting to infer it. We can allow a default value for `source_dir` to be the project root (i.e. `.` relative to project root).

None of this needs to depend on CWD apart from the *default* location of the `pyproject.toml` but it should also be possible to provide that as an absolute path e.g. `--rcfile=/home/oscar/proj/pyproject.toml` making the runtime CWD irrelevant to the configuration/tracing/reporting process.

All of this gets wrapped up by the `spin` frontend but what happens in the build and test cycle for a meson-based project with the layout shown above is that we first run:
```console
$ meson setup build --various-config-arguments
```
This initialises the build directory and sets our build configuration. Then we build with
```console
$ meson compile -C build
```
This builds all of the intermediate and end-product files that are needed for an install in the build directory. Then we "install" this into the `build-install` directory with:
```console
$ meson install --only-changed -C build --destdir ../build-install
```
This generates the importable package inside `build-install/site-packages` where the built `.so` files and the `.py` files are combined into their installed layout.

Then when we run the tests there are many different ways of doing it. One way is:
```console
$ export PYTHONPATH="/home/oscar/proj/build-install/site-packages"
$ pytest --pyargs pkg --cov=pkg
```
Another way is to actually change the working directory like:
```console
$ cd build-install/site-packages
$ pytest --pyargs pkg --cov=pkg
```
We could also run `coverage` directly like:
```console
$ export PYTHONPATH="/home/oscar/proj/build-install/site-packages"
$ coverage run test/tests.py
```
And of course there are many other ways of running the tests like by building and installing a wheel into a temporary venv with tools like nox etc.

In any case there are lots of different ways of doing this but the point is that various assumptions don't hold:

1. The CWD is not necessarily the project root either during the build steps or when running the tests or when reporting coverage.
2. The CWD for running the tests may not be the same as the CWD at coverage report time.
3. The files used at runtime may be copies of the source files that appear in otherwise ephemeral directories or may even have been installed to actual site-packages but we want to report coverage on the actual source files located in the code base (in the `src` directory).

The way that the Cython coverage plugin works is that firstly the Cython-generated C code has comments in it like:
```c
    /* "flint/types/nmod.pyx":150
 *         cdef mp_limb_t val
 *         if any_as_nmod(&val, t, (<nmod>s).mod):
 *             r = nmod.__new__(nmod)             # <<<<<<<<<<<<<<
 *             r.mod = (<nmod>s).mod
 *             r.val = nmod_mul(val, (<nmod>s).val, r.mod)
*/
    __Pyx_TraceLine(150,20,0,__PYX_ERR(0, 150, __pyx_L1_error))
```
At runtime the `__Pyx_TraceLine` call gives the trace event but the comment `"flint/types/nmod.pyx":150` maps this event to a line of the source file. The coverage plugin receives `file_tracer` calls for paths like `flint/types/nmod.pyx` and then searches around for the associated `nmod.c` file. It then parses the C code and extracts all of these comments to build a line-map data structure. Then the `FileTracer.dynamic_source_filename` method receives a frame object, extracts the path from `frame.f_code.co_filename` and returns the actual path of the source file. Lastly the FileReporter receives *those* paths at report time, searches again for the .c files and this time uses the line-map data structure to return the executable lines in its `lines()` method.

The way that this goes wrong specifically with cython and meson is as follows:

Firstly the meson build command runs ninja in the build directory like:
```console
$ cd build && ninja build
```
Then ninja runs Cython like:
```console
$ cython ../src/pkg/mod.pyx -o src/pkg_mod.pyx.c
```
Already now it goes wrong because cython inserts the wrong paths into the .c file when the .pyx file is not under CWD. The .c file ends up having a mixture of absolute paths like `/home/oscar/proj/src/pkg/mod.pyx` and just basenames like `mod.pyx`. This is bad because the .c files are supposed to be distributable so absolute paths should not be used. Also obviously just using basename is bad because there can be multiple files with the same name and we don't know where in the tree to find them anyway.

I changed this in https://github.com/cython/cython/pull/6341 so that now when Cython (master branch) is run on a file not under CWD it will consistently always output the import-relative path `pkg/mod.pyx` with POSIX slashes. Still when the file is under CWD a mixture of CWD-relative and import-relative paths will be produced (I didn't change that because I expect that the current coverage plugin relies on it). For a meson/cmake-based project this means that we now know that trace events from cython-generated C code will reliably produce import-relative paths for the original source files regardless of where the extension module actually is at runtime.

The next step is to receive those paths back from coverage.py in the plugin's `file_tracer` method. The problem at this point is that coverage.py messes them up. I have carefully ensured that an import-relative path `pkg/mod.pyx` is in the trace event but coverage tries to turn it into an absolute path so what comes through to `file_tracer` is the non-existent path:
```
/home/oscar/proj/pkg/mod.pyx
```
My plugin works around this by removing the root prefix to get the import-relative path and then joining that with the absolute path of the src directory at report time:
https://github.com/flintlib/python-flint/blob/03a4bafb4d6d112225c29cad352096b7854bc8f7/coverage_plugin.py#L106-L124
Then we get back the proper absolute path:
```
/home/oscar/proj/src/pkg/mod.pyx
```
While it is possible to work around this I don't think that this is the right way: coverage.py should not generate the broken path in the first place. The problem here is that coverage.py is treating the import-relative path as a CWD-relative path and using that incorrect assumption when trying to make an absolute path. (Note that my plugin actually gets this wrong and strips off the root directory but checking now I see that it should strip off CWD.)

There should be some way to prevent coverage.py from doing that so that if I am in control of the paths generated in the trace events then I also know what paths my plugin will receive and the CWD should not play any part in this because CWD can be anything. I think that by default coverage.py should pass the unadulterated import-relative paths through or if not that then at least it should be possible to configure this in the plugin like:
```python
class Plugin:
    mangle_paths = False
```
If the plugin can reliably assume that it is going to get import-relative paths then all it needs to do is filter them by package name, and then separate the .pyx and .py files. I don't know if this is possible but it would be much nicer in the `file_tracer` method if we could also get the module name like `pkg.mod` regardless of whether the paths are absolute or relative because the most likely first step to filter this is by checking for the `pkg` package. In fact if it is possible to get the module name then the ideal thing would be to set that in the plugin like:
```python
class Plugin:
    packages = ['pkg']
```
Then coverage.py can do that filtering before calling `Plugin.file_tracer`.

All the same we can implement the `file_tracer` method and we can work around the broken path problem but what I have ended up with is this:
```console
$ meson setup build -Dcoverage=true
$ meson compile -C build
$ meson install --only-changed -C build --destdir ../build-install
$ export PYTHONPATH="/home/oscar/current/active/python-flint/build-install/usr/lib/python3.12/site-packages"
$ coverage run -m flint.test
$ coverage report
Name                                                                             Stmts   Miss  Cover
----------------------------------------------------------------------------------------------------
build-install/usr/lib/python3.12/site-packages/flint/__init__.py                    37      0   100%
build-install/usr/lib/python3.12/site-packages/flint/flint_base/__init__.py          0      0   100%
build-install/usr/lib/python3.12/site-packages/flint/functions/__init__.py           0      0   100%
build-install/usr/lib/python3.12/site-packages/flint/test/__init__.py                0      0   100%
build-install/usr/lib/python3.12/site-packages/flint/test/__main__.py               86     36    58%
build-install/usr/lib/python3.12/site-packages/flint/test/test_all.py             3348   1282    62%
build-install/usr/lib/python3.12/site-packages/flint/types/__init__.py               0      0   100%
build-install/usr/lib/python3.12/site-packages/flint/utils/__init__.py               0      0   100%
build-install/usr/lib/python3.12/site-packages/flint/utils/flint_exceptions.py       6      0   100%
flint/flint_base/flint_base.pyx                                                    458     58    87%
flint/flint_base/flint_context.pxd                                                   8      4    50%
flint/flint_base/flint_context.pyx                                                  29      1    97%
flint/functions/showgood.pyx                                                        62      8    87%
flint/pyflint.pyx                                                                    3      0   100%
flint/types/acb.pyx                                                               1035    136    87%
flint/types/acb_mat.pyx                                                            391    106    73%
...
```
The first problem here is that while the coverage is actually being reported correctly for the Cython (`.pyx` and `.pxd`) files it is not being reported correctly for the `.py` files. In particular this shows `test_all.py` as having 62% coverage which is definitely not true. That file is literally the test code and the tests do run!

My plugin does not do anything with the .py files but the problem only appears when my plugin is in use. Somehow it prevents coverage.py from reporting the coverage of .py files correctly.

This snippet from `coverage html` shows clearly that it is bogus because it alternates between covered and uncovered statements in the same test function:

![image](https://github.com/user-attachments/assets/43b59e08-c839-4b9f-93c9-0f60d6e99bc4)

I assume that coverage is getting confused about the paths in this situation somehow when handling the trace events. Note that I am not doing anything funny with CWD here: I just use `PYTHONPATH` to make the "installed" package available and then run
```console
$ coverage run -m flint.test
```
which is a module that imports all the test functions and calls them (no subprocesses or other funny business...):
https://github.com/flintlib/python-flint/blob/master/src/flint/test/__main__.py

I can also try doing it like this:
```console
$ export PYTHONPATH=$(pwd)  # to find my coverage_plugin.py
$ cd build-install/usr/lib/python3.12/site-packages/
$ coverage run --rcfile=../../../../../.coveragerc -m flint.test -t
$ coverage --rcfile=../../../../../.coveragerc report
```
I still get an incomplete coverage for `test_all.py` this way though. Instead of using `coverage` directly here I can use `pytest-cov`:
```console
$ export COVERAGE_RCFILE=$(pwd)/.coveragerc
$ export PYTHONPATH=$(pwd)
$ cd build-install/usr/lib/python3.12/site-packages/
$ pytest flint/test/ --cov=flint
======================================== test session starts ========================================
platform linux -- Python 3.12.0, pytest-7.4.3, pluggy-1.3.0
rootdir: /home/oscar/current/active/python-flint
plugins: doctestplus-1.2.1, cov-4.1.0, hypothesis-6.88.4, xdist-3.4.0
collected 54 items                                                                                  

flint/test/test_all.py ......................................................                 [100%]

---------- coverage: platform linux, python 3.12.0-final-0 -----------
Name                              Stmts   Miss  Cover
-----------------------------------------------------
flint/__init__.py                    37      0   100%
flint/flint_base/__init__.py          0      0   100%
flint/functions/__init__.py           0      0   100%
flint/test/__init__.py                0      0   100%
flint/test/__main__.py               86     86     0%
flint/test/test_all.py             3348   1285    62%
flint/types/__init__.py               0      0   100%
flint/utils/__init__.py               0      0   100%
flint/utils/flint_exceptions.py       6      0   100%
-----------------------------------------------------
TOTAL                              3477   1371    61%


======================================== 54 passed in 7.23s =========================================
```
Again incomplete coverage of `test_all.py`. This time all the .pyx files are missing as well. I told `pytest-cov` I wanted to track the `-m flint` module but I assume that `pytest-cov` is filtering them out somehow because their "path" is not under CWD. Again using absolute paths and CWD is not the right way to think about this: rather we should think of import-relative paths and `-m flint` should just mean all source files contributing to the flint package regardless of where in the filesystem those files are and what the CWD is.

The second problem with the coverage report output is the way that the Cython files are shown as import-relative paths but the Python files are shown as root-relative paths to the temporary files in the build directory. I'm not sure what the ideal output here should be but it is one of:

- All paths are root-relative paths to the files under `src`.
- All paths are import-relative paths.

Aesthetically I prefer import-relative paths here but I assume that root-relative paths should be preferred if we want e.g. codecov to parse the `.coverage` dump (maybe it doesn't matter?). Either way we don't want paths to the temporary files under `build-install` and we do want the different types of files to have the same type of path.

I thought that I could redirect the .py paths in the Plugin but it doesn't seem to be possible without taking responsibility for tracing and reporting them. That is what Cython's coverage plugin does but ideally it would leave the .py files alone for coverage.py to handle (sometimes .py files get cythonised which is a special case here but I mean non-cythonised .py file...). What I wanted to do is something like this:
```python
from coverage.somewhere import CoveragePythonTracer

class Plugin:
    def file_tracer(self, filename):
        if filename is in build-install:
            actual_filename = 'src' / import_relative_path(filename)
            return CoveragePythonTracer(actual_filename)
```
I can't immediately see something like that available though.

Really though a plugin should not be needed for this. Is there a way to tell coveragepy that the .py files under build-install are copies of the actual source code in the `src` directory?

This is what I meant above with:
```toml
[tool.coverage]
source_dir = 'src'
packages = ['pkg']
```
There should be some way to tell coverage.py that the package `pkg` is what we are measuring and that its source files are in `src` and each of its modules from `sys.path` defines an import-relative path that for the purpose of reporting identifies a path relative to `src`.

Maybe there already is a way to do this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Coverage measurement of Cython code in meson-based projects #1837

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Coverage measurement of Cython code in meson-based projects #1837

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions