Warning
This project is in early stages of development. Do not use.
The most important existing tools for the generation of Haskell bindings from C
headers, hsc2hs
and c2hs
, require a lot of user input (see Alternative
generators for
a full review): they assist in writing bindings by filling in details about the
C code when requested, but the process is still driven by the programmer. The
goal of hs-bindgen
, inspired by the Rust
bindgen
tool, is to have the
entire process be driven by the C header(s) themselves.
It should be possible to run the tool as a preprocessor, or in Template Haskell mode, offering a convenient workflow
module MyModule
generateBindingsFor "path/to/foo.h"
We should support cross compilation, ideally in both execution modes, but definitely in preprocessor mode.
We need to do this reliably, which means that we need to use existing
infrastructure (for example, to find out the offsets of all fields inside a
struct). We will therefore bind to
libclang
.
One of the downsides of working with c2hs
is that users need to learn a new
(frankly rather arcane) syntax. We want to limit any new syntax that users might
have to learn, working primarily with just regular Haskell. The low-level /
high-level split we propose (see below) is part motivated by this requirement:
even if we do not use the tool to generate high-level Haskell bindings, users
can write their own, by writing regular Haskell code that happens to work with
the (generated) low-level bindings. This should also improve integration with
tooling such as HLS.
For the high-level bindings this is more challenging, as users will need to be
provided with ways to customize decisions made by the
tool. A good option for
power-users here might be to offer hs-bindgen
as a library, so that
customization can be done again with regular Haskell code.
The project is split up into three major milestones, each of which are useful in their own right and can be released as version 0.1, 0.2 and 0.3.
The object here is to be able to generate Haskell types with Storable
instances for "all"
struct
,
enum
and
union
definitions found
in the C header.
We should support most field types, including bitfields, fixed size arrays, flexible array members, etc.
This will require a mechanism to select which instances are of interest, perhaps
similar to those supported by Rust
bindgen, or through some
kind of "program slicing",
starting with a set of functions the user is interested in. This is especially
important because headers can #include
other headers.
The explicit goal of this milestone and the next one is to generate low-level
bindings that mirror the C definitions exactly. So, for example, if a struct
contains a field of type char*
, the corresponding field in the Haskell
type will have type Ptr CChar
. Constructing higher-level bindings
(where we might use String
, for example), will not be considered until
Milestone 3: High-level API.
As such, it should be possible to generate these bindings with minimal user
input or customization, ideally none (apart from selection).
We should also generate a
test-suite to check that
the Storable
instances we generate are correct.
The goal of this milestone is to generate low-level foreign import
declarations for all
functions declared in the header file. Like in milestone 1, the goal here is to
avoid needing user input as much as possible, though some decisions do need to
be made (for example, should calls be safe
or unsafe
?).
Whenever possible, if the C header contains documentation, we should also include that documentation as Haddocks in the generated bindings.
We should support functions that accept or return struct
s by
value, by generating
appropriate wrappers for them.
We should also generate binding for constants and global variables.
There should also support some additional C types in this milestone (types which
don't involve Storable
instances), such as
typedef
s, and
incomplete structs.
While some for users these low-level bindings might be useable as-is, the primary objective here is to make it easier for users to manually write high-level bindings; this is now regular Haskell coding, and should be well supported by tooling such as HLS.
We might want to release this together with milestone 2.5, see below.
This milestone sits in between milestones 2 and 3 because it is useful for both.
When hand-writing high-level bindings, there are undoubtedly a lot of patterns
that emerge. We should capture these as Haskell functions or type
classes and release this
as a separate library
hs-bindgen-patterns
.
Even in the ideal case that all patterns that are used in the construction
of the high-level bindings can be expressed using the patterns provided by the
hs-bindgen-patterns
library from milestone 2.5, it might still be cumbersome
to have to write them all out, and so some generation might still be useful.
This is all the more important for data type declarations (as opposed to
function definitions); we'll want to try and generate high-level equivalents
for struct
s,
enum
s, and
(tagged) unions.
However, there is a trade-off here. There are lot of decisions that need to
be made for the high-level
bindings: the C header file
does not provide sufficient information by itself. This means that the tool must
be customizable, for example through a DSL, through annotations in the C header
files themselves, or through using hs-bindgen
as a library with customizations
as regular Haskell code. It is conceivable that in cases that would require
extensive customization, perhaps the most direct way to do that customization is
not to use generation at all, but simply write bindings manually, provided that
the hs-bindgen-patterns
library provides sufficient support.
Nonetheless, there will probably be scenarios where a set of defaults and heuristics can do a good job at generating high-level bindings, without much -- or any -- input from the user.
To make tweaking of the output easier, the tool should include comments in the generated code that explain tool decisions. In other words, the generated code should provide sufficient information to the user to allow them to change the way that the code is generated.
This milestone is currently just a collection of additional features that we might consider, such as
- generating bindings for C preprocessor macros
- support for function pointers: generating bindings for function addresses or generating function pointers from Haskell functions, and conversely for resolving function pointers.
- support varargs functions
- deal with under-defined functions
- support thread-local variables
- support multidimensional arrays