Design for program-defined metafunctions for cppfront

<a id="title"/>

# [D](#title)esign for program-defined metafunctions for cppfront

<a id="intro"/>

## [I](#intro)ntroduction

This write-up presents a design to extend cppfront to evaluate program-defined metafunctions.

<a id="conception"/>

## [C](#conception)onception

Support for metafunctions was first added by commit d8c1a50f22c1b171a50e87ccdb609fb05f41c021,
"First checkin of partial meta function support, with `interface` meta type function".
Its commit message also included the following sentence.
> - There is not yet a general Cpp2 interpreter that runs inside the cppfront compiler and lets users write meta functions like `interface` as external code outside the compiler.

After a lot of thinking, the idea of a "Cpp2 interpreter" seemed backwards to what cppfront is.
Cppfront takes Cpp2 and lowers it to Cpp1, just like
Cfront takes Cpp1 and lowers it to C.
Interpreting Cpp2 could then be taken to mean one of two things:
1. Building an interpreter that is a superset of the C++ abstract machine.
   This way, interpreted Cpp2 (i.e., metafunctions) is just as capable as normal Cpp2 code.
2. Building an interpreter that is a very constrained subset of Cpp2.
   This would be like `constexpr` in C++11, and would probably evolve similarly.

Interpretation 1 means changing what cppfront fundamentally is.
Interpretation 2 feels unsatisfactory.
It is very constrained and without the power of the whole language at your disposal.

I thus realized that there is an alternative to interpreted Cpp2.
That alternative is loading a metafunction compiled in a library during the execution of `cppfront`.
This model doesn't change what cppfront is.
Additionally, a metafunction is normal Cpp2 code, just like the implementations of built-in metafunctions.

<a id="counterpoints"/>

## [C](#counterpoints)ounterpoints

In this design, a metafunction is "normal Cpp2 code".
In the Circle model of meta-programming, "normal Cpp1 code" can be executed at compile-time.
This has raised concerns, quoted below, that are relevant to the present design.
In our case, rather than compile-time, it's during metafunction evaluation.

> However, we do not believe \[the Circle] metaprogramming model is the right direction for C++’s future.
> We raise the following concerns:
> - …
> - The ability (and potential need) to call into shared libraries from the compiler raises the
>   kinds of security concerns that led SG7 to discard `std::embed` (P1040).
> - …
> -- [P2062 The Circle Meta-model](https://wg21.link/P2062)

> Circle is a fork of C++ that enables arbitrary compile-time execution (e.g. a compile-time `std::cout`), coupled with reflection to allow powerful meta-programming. SG7 was interested in it and [considered copying parts of it](https://wg21.link/P2043). However, concerns were [raised](https://wg21.link/P2062) about security and usability problems, so the ability to execute arbitrary code at compile-time was rejected.
> -- [2020-02 Prague ISO C++ Committee Trip Report — 🎉 C++20 is Done! 🎉 : cpp](https://www.reddit.com/r/cpp/comments/f47x4o/202002_prague_iso_c_committee_trip_report_c20_is/)

> Also, the committee already reviewed a paper describing the Circle evaluation model and expressed some concerns with issues related to trust and implementability, but was generally interested in being able to do more at compile-time, within reason. I didn't mention that because that's already the trajectory for constant expression evaluation.
>
> > For example, I don't see the point of adding compile-time specific I/O APIs that won't be compatible with any library; the whole idea of Circle is that you just take your existing C++ code and use it at compile time.
>
> The ability to open a file at compile-time and the ability to execute existing code have largely orthogonal concerns. I think we should be able to execute more code at compile without having to explicitly label it `constexpr`, but I draw the line at allowing the compiler opening arbitrary files on the whim of some 3rd party library on my behalf.
> -- Part of a reply from the thread starting at <https://www.reddit.com/r/cpp/comments/jf4wsw/comment/g9mxpqc/?utm_source=share&utm_medium=web2x&context=3>

<a id="alternatives"/>

## [A](#alternatives)lternatives

Any alternative that requires recompiling `cppfront` or hard-coding metafunctions isn't viable at scale.

I also considered whether we could use Cpp1's `constexpr` and `consteval`.
These don't serve us if we are to use an existing `cppfront` program.
Consider the [counterpoints](#counterpoints).
Given Cpp1's `if consteval`, a `constexpr` function can't be guaranteed to not use IO.

That said, it could be possible to require a metafunction to be `constexpr`
and to actually evaluate it during constant evaluation to produce the updated type.
The technique to implement that would me similar to the one presented in
[Interactive C++ in a Jupyter Notebook Using Modules for Incremental Compilation - Steven R. Brandt](https://www.youtube.com/watch?v=9XWCm9iV-wk).
But that is not this design (and I haven't explored such a design).

<a id="counter-counterpoints"/>

## [C](#counter-counterpoints)ounter-counterpoints

Maybe a metafunction can be required to be `@pure` (<https://github.com/hsutter/cppfront/discussions/797#discussioncomment-7860363>).
Then, even thought a metafunction is still normal Cpp2 code, it isn't as problematic.
Although `@pure` still seems too restrictive.

<a id="design"/>

## [D](#design)esign

[Boost.DLL]: https://www.boost.org/doc/libs/release/doc/html/boost_dll.html

This is based on what I learned from studying the documentation of [Boost.DLL][].

We need to emit a metafunction as an `extern "C"` symbol.
The mangling of a Cpp1 symbol is experimental and not as portable (<https://www.boost.org/doc/libs/master/doc/html/boost_dll/mangled_import.html>).
When loading the symbol of a metafunction, we need to use the same emitted name.
This means that we need a protocol for the symbol name and to "C namespace" it.

In its simplest form, we just need a function that,
given the Cpp2 name of a metafunction (as `@`-used),
it returns a function object that evaluates the metafunction.

There is an implementation of this design at #907.
Details on how this design was applied, as well as other implementations details, can be found there.

<a id="evolution"/>

## [E](#evolution)volution

<a id="lookup"/>

### [N](#evolution)ame lookup

Up until now, cppfront has been able to rely on the name lookup of lowered Cpp1 code.
But this design introduces an evaluation point that happens outside the C++ abstract machine.
It wants to look up a name that has already been compiled in Cpp1
and use it as named in Cpp2 code before the Cpp2 code has been lowered to Cpp1.

The current design doesn't consider name lookup.
It expects a metafunction name to be `@`-used unqualified and to follow C "namespacing" conventions.

<a id="deps"/>

### [D](#deps)ependency scanning

The current design only requires specifying a protocol for lowering and loading a metafunction.
To author and consume a metafunction at scale, we also need dependency scanning, pretty much like Cpp1 modules.

Many of us use a build system to manage the complexity of building Cpp1 code.
We would like to avoid having `cppfront` run on a Cpp2 source that hasn't changed
and if all of the libraries that provide the metafunctions it uses haven't changed.
Conversely, we want `cppfront` to rerun if one of those libraries has changed.

We can't know which metafunction a Cpp2 source uses
without manually duplicating this information in the build system description.
`cppfront` can't just emit the dependency information after the fact (like Cpp1 compilers on `#include`d headers)
because the libraries need to have been built before it starts evaluating the metafunction.

It has been suggested that `cppfront` could have a command line argument for compiling a metafunction library.
That would obviate the need for a dependency scanner, but this inversion of the build logic has drawbacks.

There was an article that I can't find, I think linked from the LLVM Discourse,
about how some other language's compiler (Go or Scala?) forked itself to build a module's sources in parallel.
That ended up resulting in file system races in very rare cases.
They rewrote their module compilation system to not fork itself and instead rely on their build system.
That fixed the issues, and even (significantly? in some cases?) reduced compile times.

I think the general issue is attempting to do what should be done at a higher level.
The higher level being that of the build system.
The CMake support for Cpp1 modules already went in the direction of a dependency scanner
(along with a long trail of papers for proper modules support).
I think it'd be unwise to go in the other direction,
which doesn't even seem to have build system support.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Design for program-defined metafunctions for cppfront #909

Design for program-defined metafunctions for cppfront

Introduction

Conception

Counterpoints

Alternatives

Counter-counterpoints

Design

Evolution

Name lookup

Dependency scanning

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Design for program-defined metafunctions for cppfront #909

Description

Design for program-defined metafunctions for cppfront

Introduction

Conception

Counterpoints

Alternatives

Counter-counterpoints

Design

Evolution

Name lookup

Dependency scanning

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions