Skip to content

Conversation

@jmarshall
Copy link
Member

The full preprocessor search path for #include "..." is typically:

1) directory of the file containing the #include directive
2) environment variables $CPATH, $C_INCLUDE_PATH, etc
3) command line options -I etc
4) system include directories

It's easy to oversimplify (1) as “the current directory” but that's misleading when the compiler's PWD is always the top-level htslib source directory but the “current directory of the #include directive” might be one of the subdirectories.

When a top-level *.[ch] file does #include "htslib/foo.h", the nearby header file within the source tree is found due to (1). When a file in a subdirectory (htslib/*.h, cram/*.[ch], or test/*.c) does the same #include, the nearby header file is intended to be found due to (3) via the Makefile's hard-coded -I.. here being the compiler process's $PWD, i.e., the top-level HTSlib source directory.

However the latter can be subverted if there is an "htslib" directory containing foo.h in $CPATH, $C_INCLUDE_PATH, or on the command line prior to the hard-coded -I.. This will be the case if users have made a previous HTSlib installation accessible via $CPATH/etc or by adding a -I option to $(CFLAGS) (rather than $(CPPFLAGS)). One reason we have found this hard to reproduce in the past is that we have naturally added dummy -I options to CPPFLAGS, but that comes later in the search path than -I.. Naive users OTOH are likely to add their dummy -I options to CFLAGS.

Avoid this and fix #347 by adding ../ to #includes in subdirectories (and always using "" rather than <>), so that includes within HTSlib (even from the source subdirectories) are found due to (1).

This PR is split into three commits: for htslib/*.h, for cram/*.h, and for all other *.h files. The question is whether to apply 1, 2, or all 3 commits.

  • The first commit covers the public headers, which is the likely scenario of “oops I added my previous HTSlib installation to $C_INCLUDE_PATH”.

  • Adding the second commit extends this to the private cram/*.h headers. See the commit messages for details; note that for a (long) time Debian packaged and shipped these headers alongside htslib/*.h.

  • Adding the third commit as well extends this to all intra-HTSlib inclusions, so they're all similarly somewhat uglified. With this, the only #includes in HTSlib still depending on (3)'s hard-coded -I. are each *.c file's #include <config.h>.

Probably the answer is either to apply 1 (for the minimal change) or all 3 commits (for consistency).

jmarshall added 3 commits May 21, 2020 07:50
The full preprocessor search path for #include "..." is typically:

    1) directory of the file containing the #include directive
    2) environment variables $CPATH, $C_INCLUDE_PATH, etc
    3) command line options -I etc
    4) system include directories

When a top-level *.[ch] file #includes "htslib/foo.h", the nearby
header file within the source tree is found due to (1). When a file
in a subdirectory (htslib/*.h, cram/*.[ch], or test/*.c) does the
same #include, the nearby header file is intended to be found due
to (3) via the hard-coded `-I.` in the Makefile. (`.` here being the
compiler process's $PWD, i.e., the top-level HTSlib source directory.)

However the latter can be subverted if there is an "htslib" directory
containing foo.h in $CPATH, $C_INCLUDE_PATH, or on the command line
prior to the hard-coded `-I.`. This will be the case if users have
made a previous HTSlib installation accessible via $CPATH/etc or by
adding a -I option to $(CFLAGS) (rather than $(CPPFLAGS)).

Avoid this by adding ../ to #includes in subdirectories (and always
using "" rather than <>), so that all includes of htslib/*.h within
HTSlib are found due to (1). Fixes samtools#347.
Similarly to the previous commit, includes of cram/*.h from files in
subdirectories can be subverted if $CPATH/etc or $(CFLAGS) activates
a directory containing cram/*.h files. This is less likely than the
previous case of previously-installed public htslib/*.h headers, but
note that e.g. Debian for a time shipped cram/*.h headers, and the
particularly misguided might perhaps add a previous HTSlib source
directory to $C_INCLUDE_PATH.

Avoid this by removing cram/ from cram/*.h #includes in cram/*.[ch]
(and adding ../ elsewhere) and converting any instances of <> to "".
Similarly to the previous commits, includes of these headers from
files in subdirectories can be subverted if $CPATH/etc or $(CFLAGS)
activates a directory containing them. This is unlikely,  but the
particularly misguided might perhaps add a previous HTSlib source
directory to $C_INCLUDE_PATH.

To avoid that scenario, and for consistency with all the other
intra-HTSlib #includes, add ../ to #includes of top-level headers
in subdirectories.

With this, the only remaining #includes in HTSlib requiring `-I.` (as
is hard-coded in Makefile) are each *.c file's <config.h> inclusion.
(Using <> for that is as recommended by the autoconf documentation.)
@jmarshall
Copy link
Member Author

Tested by preparing independent dummy and real header directories:

cd [HTSlib_toplevel_source_directory]

mkdir -p $HOME/dummy-hts/htslib
for f in htslib/*.h; do echo "#error found installed $f" > $HOME/dummy-hts/$f; done

mkdir -p $HOME/dummy-cram/cram
for f in cram/*.h; do echo "#error found installed $f" > $HOME/dummy-cram/$f; done

mkdir -p real-cram/cram
for f in cram/*.h; do echo '#include "../../'$f'"' > real-cram/$f; done

mkdir -p real-top/os
for f in *.h; do echo '#include "../'$f'"' > real-top/$f; done
echo '#include "../../os/lzma_stub.h"' > real-top/os/lzma_stub.h

mkdir real-conf
cp config.h real-conf

and replacing -I. in Makefile with -Ireal-conf, and then building various different combinations of -I$(HOME)/dummy-hts, -I$(HOME)/dummy-cram, -Ireal-cram, and -Ireal-top in CFLAGS and/or $CPATH.

@daviesrob daviesrob merged commit 08488f7 into samtools:develop May 26, 2020
@daviesrob
Copy link
Member

Full solution installed, thanks. We'll have to remember to do future includes this way.

@jmarshall jmarshall deleted the relincludes branch May 26, 2020 09:42
@jmarshall
Copy link
Member Author

Thanks. The upcoming test/maintainer/check_dependencies.pl will need rejigging to recognise these relative paths and it can also be tweaked to validate (for htslib) that future includes are done this way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

HTSlib should prefer its own headers when being compiled

2 participants