The C++ dialect used in this project constitutes a subset of the standard C++ programming language and library features. We want our dialect to be compatible with the LLVM C++ language subset that will be in use at the time that we integrate with that project. We also want to maximize portability, future-proofing, compile-time error checking, and use of best practices.
To that end, we have a C++ style guide (q.v.) that lays out the details of how our C++ code should look and gives guidance about feature usage.
We have chosen to use some features of the recent C++17 language standard in f18. The most important of these are:
- sum types (discriminated unions) in the form of
std::variant
using
template parameter packs- generic lambdas with
auto
argument types - product types in the form of
std::tuple
std::optional
(std::tuple
is actually a C++11 feature, but I include it
in this list because it's not particularly well known.)
First, some background information to explain the need for sum types in f18.
Fortran is notoriously problematic to lex and parse, as tokenization depends on the state of the partial parse; the language has no reserved words in the sense that C++ does. Fortran parsers implemented with distinct lexing and parsing phases (generated by hand or with tools) need to implement them as coroutines with complicated state, and experience has shown that it's hard to get them right and harder to extend them as the language evolves.
Alternatively, with the use of backtracking, one can parse Fortran with a unified lexer/parser. We have chosen to do so because it is simpler and should reduce both initial bugs and long-term maintenance.
Specifically, f18's parser uses the technique of recursive descent with
backtracking.
It is constructed as the incremental composition of pure parsing functions
that each, when given a context (location in the input stream plus some state),
either succeeds or fails to recognize some piece of Fortran.
On success, they return a new state and some semantic value, and this is
usually an instance of a C++ struct
type that encodes the semantic
content of a production in the Fortran grammar.
This technique allows us to specify both the Fortran grammar and the representation of successfully parsed programs with C++ code whose functions and data structures correspond closely to the productions of Fortran.
The specification of Fortran uses a form of BNF with alternatives, optional elements, sequences, and lists. Each of these constructs in the Fortran grammar maps directly in the f18 parser to both the means of combining other parsers as alternatives, &c., and to the declarations of the parse tree data structures that represent the results of successful parses. Move semantics are used in the parsing functions to acquire and combine the results of sub-parses into the result of a larger parse.
To represent nodes in the Fortran parse tree, we need a means of
handling sum types for productions that have multiple alternatives.
The bounded polymorphism supplied by the C++17 std::variant
fits
those needs exactly.
For example, production R502 in Fortran defines the top-level
program unit of Fortran as being a function, subroutine, module, &c.
The struct ProgramUnit
in the f18 parse tree header file
represents each program unit with a member that is a std::variant
over the six possibilities.
Similarly, the parser for that type in the f18 grammar has six alternatives,
each of which constructs an instance of ProgramUnit
upon the result of
parsing a Module
, FunctionSubprogram
, and so on.
Code that performs semantic analysis on the result of a successful
parse is typically implemented with overloaded functions.
A function instantiated on ProgramUnit
will use std::visit
to
identify the right alternative and perform the right actions.
The call to std::visit
must pass a visitor that can handle all
of the possibilities, and f18 will fail to build if one is missing.
Were we unable to use std::variant
directly, we would likely
have chosen to implement a local SumType
replacement; in the
absence of C++17's abilities of using
a template parameter pack
and allowing auto
arguments in anonymous lambda functions,
it would be less convenient to use.
The other options for polymorphism in C++ at the level of C++11 would be to:
- loosen up compile-time type safety and use a unified parse tree node representation with an enumeration type for an operator and generic subtree pointers, or
- define the sum types for the parse tree as abstract base classes from
which each particular alternative would derive, and then use virtual
functions (or the forbidden
dynamic_cast
) to identify alternatives during analysis
Many productions in the Fortran grammar describe a sequence of various
sub-parses.
For example, R504 defines the things that may appear in the "specification
part" of a subprogram in the order in which they are allowed: USE
statements, then IMPORT
statements, and so on.
The parse tree node that represents such a thing needs to incorporate
the representations of those parses, of course.
It turns out to be convenient to allow these data members to be anonymous
components of a std::tuple
product type.
This type facilitates the automation of code that walks over all of the
members in a type-safe fashion and avoids the need to invent and remember
needless member names -- the components of a std::tuple
instance can
be identified and accessed in terms of their types, and those tend to be
distinct.
So we use std::tuple
for such things.
It has also been handy for template metaprogramming that needs to work
with lists of types.
This simple little type is used wherever a value might or might not be present. It is especially useful for function results and rvalue reference arguments. It corresponds directly to the optional elements in the productions of the Fortran grammar. It is also used as a wrapper around a parse tree node type to define the results of the various parsing functions, where presence of a value signifies a successful recognition and absence denotes a failed parse. It is used in data structures in place of nullable pointers to avoid indirection as well as the possible confusion over whether a pointer is allowed to be null.