-
Type-safe. Unlike manually written tagged unions, Datatype99 is type-safe: normally you cannot access invalid data or construct an invalid variant. Pattern matching is exhaustive too.
-
Pure C99/C++11. No external tools are required -- Datatype99 is implemented using only preprocessor macros.
-
Can be used everywhere. Literally everywhere provided that you have a standard-confirming C99/C++11 preprocessor. Even on freestanding environments.
-
Transparent. Datatype99 comes with formal code generation semantics, meaning if you try to look at
datatype
's output, normally you will not see something unexpected. -
FFI-tolerant. Because of transparency, writing an FFI is not a challenge.
- Download Datatype99 and Metalang99 (minimum supported version -- 0.4.2).
- Add
datatype99
andmetalang99/include
to your include paths. #include <datatype99.h>
beforehand.
PLEASE, use Datatype99 only with -ftrack-macro-expansion=0
(GCC) or something similar, otherwise it will throw your compiler to the moon. Precompiled headers are also very helpful.
If you do not want the shortened versions to appear (e.g., datatype
and match
instead of datatype99
and match99
), define DATATYPE99_NO_ALIASES
before #include <datatype99.h>
.
(The full example: examples/binary_tree.c
.)
A sum type is created using the datatype
macro. I guess you have already caught the syntax but actually there exist one more kind of a variant: an empty variant which is expressed simply as (Foo)
. It holds no data.
Pattern matching is likewise intuitive. Just three brief notes:
- To match an empty variant, write
of(Foo) { ... }
. - To match the default case, i.e. when all other cases failed, write
otherwise { ... }
. - To ignore one or more variables inside
of
, writeof(Foo, a, b, _, d)
.
Happy hacking!
Having a well-defined semantics of the macros, you can write an FFI which is quite common in C.
<datatype> ::= "datatype99(" <datatype-name> { "," <variant> }+ ")" ;
<variant> ::= "(" <variant-name> [ { "," <type> }+ ] ")" ;
<datatype-name> ::= <ident> ;
<variant-name> ::= <ident> ;
<match> ::= "match99(" <lvalue> ")" { <arm> }+ ;
<matches> ::= "matches99(" <expr> "," <ident> ")" ;
<if-let> ::= "ifLet99(" <lvalue> "," <variant-name> "," <ident> [ { "," <ident> }+ ] ")" <stmt>;
<of> ::= "of99(" <variant-name> [ { "," <ident> }+ ] ")" <stmt> ;
<otherwise> ::= "otherwise99" <stmt> ;
(It might be helpful to look at the generated code of examples/binary_tree.c
's BinaryTree
.)
- Before everything, the following type definition is generated:
typedef struct <datatype-name> <datatype-name>;
- For each non-empty variant, the following type definition is generated (the metavariable
<type>
ranges over a corresponding variant's types):
typedef struct <datatype-name><variant-name> {
<type>0 _0;
...
<type>N _N;
} <datatype-name><variant-name>;
- For each non-empty variant, the following type definitions to types of each field of
<datatype-name><variant-name>
are generated:
typedef <type>0 <variant-name>_0;
...
typedef <type>N <variant-name>_N;
- For each variant, the following type definition to a corresponding sum type is generated:
typedef struct <datatype-name> <variant-name>SumT;
- For each sum type, the following tagged union is generated (inside the union, only fields to structures of non-empty variants are generated):
typedef enum <datatype-name>Tag {
<variant-name>0Tag, ..., <variant-name>NTag
} <datatype-name>Tag;
typedef union <datatype-name>Variants {
char dummy;
<datatype-name><variant-name>0 <variant-name>0;
...
<datatype-name><variant-name>N <variant-name>N;
} <datatype-name>Variants;
struct <datatype-name> {
<datatype-name>Tag tag;
<datatype-name>Variants data;
};
- For each variant, the following function called a value constructor is generated:
inline static <datatype99-name> <variant-name>(...) { /* ... */ }
match99
has the expected semantics: it sequentially tries to match the given instance of a sum type against the given variants, and, if a match has succeeded, it executes the corresponding statement and moves down to the next instruction (match(val) { ... } next-instruction;
). If all the matches have failed, it executes the statement after otherwise99
and moves down to the next instruction.
of99
accepts a matched variant name as a first argument and the rest of arguments comprise a comma-separated list of bindings.
- A binding equal to
_
is ignored. - A binding not equal to
_
stands for a pointer to a corresponding data of the variant (e.g., let there be(Foo, T1, T2)
andof99(Foo, x, y)
, thenx
has the typeT1 *
andy
isT2 *
).
There can be more than one _
binding, however, non-_
bindings must be distinct.
To match an empty variant, write of99(Bar)
.
matches99
just tests an instance of a sum type for a given variant. If the given instance corresponds to the given variant, it expands to truthfulness, otherwise it expands to falsehood.
ifLet99
tests for only one variant. It works conceptually the same as
match99(<expr>) {
of(<variant-name>, vars...) { /* ... */ }
otherwise {}
}
, but has a shorter syntax:
ifLet99(<expr>, <variant-name>, vars...) { /* ... */ }
The unit type Unit99
represents a type of a single value, unit99
(it should not be assigned to anything else). Unit99
and unit99
are defined as follows:
typedef char Unit99;
static const Unit99 unit99 = '\0';
Thanks to Rust and ML for their implementations of sum types.
- Unleashing Sum Types in Pure C99 by Hirrolot
A:
- Datatype99 can be integrated into existing code bases written in pure C.
- Sometimes C is the only choice.
A: See Metalang99's README >>.
A: The datatype99
macro generates a tagged union accompanied with type hints and value constructors. Pattern matching desugars merely to a switch statement. To generate all this stuff, Metalang99 is used, which is a preprocessor metaprogramming library.
A: With -ftrack-macro-expansion=0
(GCC), there are no chances that compile-time errors will be longer than usual. Some kinds of syntactic errors are detected by the library itself, for example (-E
flag):
// !"Metalang99 error" (datatype99): "Bar(int) is unparenthesised"
datatype(A, (Foo, int), Bar(int));
The others are understandable as well:
datatype(Foo, (FooA, NonExistingType));
playground.c:3:1: error: unknown type name ‘NonExistingType’
3 | datatype(
| ^~~~~~~~
playground.c:3:1: error: unknown type name ‘NonExistingType’
playground.c:3:1: error: unknown type name ‘NonExistingType’
If an error is not comprehensible at all, try to look at generated code (-E
). Hopefully, the code generation semantics is formally defined so normally you will not see something unexpected.