Skip to content

Commit

Permalink
merging master
Browse files Browse the repository at this point in the history
  • Loading branch information
notestaff committed Mar 17, 2016
2 parents 9487e02 + 7ae2d52 commit aee5b87
Show file tree
Hide file tree
Showing 102 changed files with 282 additions and 130 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,4 @@ autom4te.cache
libtool
config.log
config.status
README.html
142 changes: 128 additions & 14 deletions README.org
Original file line number Diff line number Diff line change
Expand Up @@ -4,25 +4,139 @@
cosi2 is an efficient coalescent simulator with support for selection, population structure, variable recombination rates,
and gene conversion. It supports exact and approximate simulation modes.

Two working examples are included in the examples/ subdirectory, and
documentation is in the documentation/ directory.
This is the user's manual for cosi2 coalescent simulator.

* Downloading and installing cosi2

Conditional defines that affect cosi compilation:
cosi2 can be downloaded from http://broadinstitute.org/~ilya/

Defines that affect behavior:
* Running cosi2

COSI_RECOMB_CONSTRATE - if defined, optimizes cosi to support only constant
recombination rate. By default, cosi supports variation of recombination rate
across the region.
The basic invocation format is

COSI_GC_DISABLE - if defined, disables support for gene conversion.

Defines that affect only CPU and memory requirements, but not the output:
coalescent -p paramfile -m

COSI_IMPL_STORE_GLOC_IN_LOC - if defined, genetic map locations are stored with each physical location.
This allows constant-time mapping between the two, at the cost of extra memory usage; is useful when
the region is large and the recombination rate changes at many points in the region.
The -p paramfile option specifies the parameter file, which is a text file that describes the demographic model to be simulated. The -m option requests output to be written to stdout in the format of Richard Hudson's ms simulator (described here ; see
section "The output").

COSI_SLOWRECOMB - use the slower method of choosing recomb rates
*** Format of the parameter file

The parameter file defines the population structure and other controlling parameters for the run, using keywords. Comments are indicated by "#" at the beginning of a line.

***** Specifying the simulated region

The length of the simulated region, in basepairs, is specified as:

: length <length in bp>

The mutation rate is specified as:

: mutation_rate <mutation rate per bp per generation>

The recombination rate is specified as follows:

: recomb_file <file-name>

specifies the file describing the genetic map. The file has two columns, separated by whitespace. The first column gives a basepair position; the second column sets the crossover recombination rate, per generation, from that point until either the end
of the region or the position specified by the next line. The basepair positions in the first column must be in strictly increasing order. The first line of the genetic map file also specifies the recombination rate from the beginning of the region to
the basepair position of that line.

Gene conversion is specified by the following parameters:

: gene_conversion_relative_rate <rate of gene conversion initiation relative to crossover recombination rate>
: gene_conversion_mean_tract_length <mean gene conversion tract length in bp>
: gene_conversion_min_tract_length <minimum gene conversion tract length in bp>

***** Specifying the populations

Any population that appears in the simulation, either as a source of samples or in the history of those samples, must be defined in the parameter file; at least one sampled population is required. The syntax for defining a population is

: pop_define <pop id> <label>
: pop_size <pop id> <size>
: sample_size <pop id> <n samples>

<pop id> is an integer ID of the population, used to refer to the population when specifying demographic events. <label> is human-readable name for the population (a string, with no spaces, and not put in quotes).

For example, the following entries

: pop_define 1 European
: pop_size 1 10000
: sample_size 1 50

define population 1 (with the label "European") and set the effective present-day population size to be 10,000 and the number of sampled chromosomes to be 50.

***** Specifying the demographic history

Parameters that define the demographic history of the populations are specified as follows. They can be supplied in any order.

: pop_event migration_rate <label> <source pop id> <target pop id> <T> <probability of migration/chrom/gen>
: pop_event split <label> <source pop id> <new pop id> <T>
: pop_event change_size <label> <pop id> <T> <size for time > T>
: pop_event exp_change_size <label> <pop id> <Tend> <Tstart> <final size> <start size>
: pop_event bottleneck <label> <pop id> <T> <inbreeding coefficient>
: pop_event admix <label> <admixed pop id> <source pop id> <T> <fraction of admixed chroms from source>

In these entries, the time =<T>= is measured in generations (which can be fractional) and increases going into the past
(present = 0). Labels are used only to provide human readable output. =change_size= sets the size for all times prior to T. A bottleneck
is a point-like reduction in population size. =exp_change_size= is an exponential change, e.g.

: pop_event exp_change_size "expansion" 1 50 500 10000 1000

represents an exponential population increase in population 1 that started 500 generations ago and ended 50 generations ago, increasing from 1000 to 10000. Prior to 500 generations, the size remains at 1000 (unless changed by another pop[event]
parameter); more recently than 50 generations ago, the population size is whatever was set by the =pop_size= command.

Specifying the selected sweep:

: pop_event sweep <pop> <Tend> <selection coefficient> <position of causal allele (0.0-1.0)> <present-day frequency of causal allele>

pop gives the population in which the advantageous allele is born. =<Tend>= gives the generation at which sweep ends (in the forward-time sense). The position of the advantageous allele is specified as a floating-point number in the range 0.0-1.0,
giving its relative position within the simulated region (for example, 0.5 puts the advantageous allele in the middle
of the region). The of the advantageous allele at time =<Tend>= is specified as a number in the range 0.0-1.0 (not as a
chromosome count).

*** Command-line options

The following describes the main command-line options of cosi2.

#+BEGIN_EXAMPLE
Specifying the model:
-p [ --paramfile ] arg parameter file
-R [ --recombfile ] arg genetic map file (if specified, overrides the
one in paramfile)
-n [ --nsims ] arg (=1) number of simulations to output
-r [ --seed ] arg (=0) random seed (0 to use current time)

-J [ --trajfile ] arg file from which to read sweep trajectory.
It has two columns: first column gives the generation, second gives the fraction of
chromosomes in the sweep population carrying the derived (advantageous) allele.

-u [ --max-coal-dist ] arg (=1) for Markovian approximation mode, the level
of approximation: the maximum distance
between node hulls for coalescence to be
allowed. Distance is specified as a fraction
of the total length of the simulated region,
in the range [0.0-1.0]; 1.0 (default) means
no approximation.

Specifying the output format:
-o [ --outfilebase ] arg base name for output files in cosi format
-m [ --outms ] write output to stdout in ms format

Specifying output details:
-P [ --output-precision ] arg number of decimal places used for floats in the
outputs
-M [ --write-mut-ages ] output mutation ages
-L [ --write-recomb-locs ] output recombination locations

Misc options:
-h [ --help ] produce help message
-V [ --version ] print version info and compile-time
options
-v [ --verbose ] verbose output
-g [ --show-progress ] [=arg(=10)] print a progress message every N
simulations

#+END_EXAMPLE

Two working examples are included in the examples/ subdirectory.

68 changes: 37 additions & 31 deletions cosi/historical.cc
Original file line number Diff line number Diff line change
Expand Up @@ -525,61 +525,67 @@ genid Event_PopSizeExp2::execute() {
//std::cerr << "exp2 exec: gen=" << gen << " procName=" << procName << " pop=" << pop << " nodes=" << popPtr->pop_get_num_nodes() << "\n";
if ( procName == -1 ) {
procName = ++procCount;

std::ostringstream procNameStrm;
procNameStrm << "exp" << ++procCount;
procNameStr = procNameStrm.str();

using namespace math;
getDemography()->dg_set_pop_size_by_name( gen, pop, nchroms_t( ToDouble( popsize + static_cast< popsize_float_t >( .5 ) ) ) );
if ( cosi_fabs( popsize - popSizeBeg ) > popsize_float_t( .9 ) ) {
using namespace math;
//std::cerr << "SETTING EXP PROCESS\n";

BOOST_AUTO( exponent,
( Function< genid, gensInv_t, Const<> >( expansionRate ) *
(
Function< genid, gens_t, Const<> >( genBeg - static_cast<genid>( 0.0 ) )
BOOST_AUTO( exponent,
( Function< genid, gensInv_t, Const<> >( expansionRate ) *
(
Function< genid, gens_t, Const<> >( genBeg - static_cast<genid>( 0.0 ) )
-
Function< genid, gens_t, X_To<1> >()
) ) );
BOOST_AUTO( with_exp, exp_( exponent ) );
BOOST_AUTO( f, ( Function<genid,popsizeInv_float_t,Const<> >( 1. / popSizeBeg ) * with_exp ) );
) ) );
BOOST_AUTO( with_exp, exp_( exponent ) );
BOOST_AUTO( f, ( Function<genid,popsizeInv_float_t,Const<> >( 1. / popSizeBeg ) * with_exp ) );

PRINT4( popsize, popSizeBeg, 1.0 / eval( f, gen ), 1.0 / eval( f, genBeg ) );
PRINT4( popsize, popSizeBeg, 1.0 / eval( f, gen ), 1.0 / eval( f, genBeg ) );

BOOST_AUTO( coalRateFn, ( Function< genid, double, Const<> >( .5 ) * ( f ) ) );
BOOST_AUTO( coalRateFn, ( Function< genid, double, Const<> >( .5 ) * ( f ) ) );

#if 0
{
std::cerr.precision( 15 );
const genid g_beg( 10.0 ), g_end( 150.0 );
const gens_t g_step( 1.0 );
genid g( g_beg );
while( g < g_end ) {
PRINT2( g, eval( coalRateFn, g ) );
g += g_step;
}

for( int n = 1; n < 25; ++n ) {
PRINT2( n, ( integrateNumerically( coalRateFn, g_beg, g_end, n ) ) );
{
std::cerr.precision( 15 );
const genid g_beg( 10.0 ), g_end( 150.0 );
const gens_t g_step( 1.0 );
genid g( g_beg );
while( g < g_end ) {
PRINT2( g, eval( coalRateFn, g ) );
g += g_step;
}

for( int n = 1; n < 25; ++n ) {
PRINT2( n, ( integrateNumerically( coalRateFn, g_beg, g_end, n ) ) );
}
PRINT( definiteIntegral( coalRateFn, g_beg, g_end ) );
}
PRINT( definiteIntegral( coalRateFn, g_beg, g_end ) );
}
#endif

popPtr->
setCoalArrivalProcess(
ArrivalProcess< genid, Any< RandGen > >(
makeNonHomogeneousPoissonProcess
( coalRateFn, gen, procNameStr ) ) );
popPtr->
setCoalArrivalProcess(
ArrivalProcess< genid, Any< RandGen > >(
makeNonHomogeneousPoissonProcess
( coalRateFn, gen, procNameStr ) ) );

PRINT( popPtr->getCoalArrivalProcess() );
PRINT( popPtr->getCoalArrivalProcess() );
}

gen = genBeg;
addEvent( shared_from_this() );
} else {
if ( popPtr->getCoalArrivalProcess() && popPtr->getCoalArrivalProcess().getLabel() == procNameStr ) {
popPtr->clearCoalArrivalProcess();
//std::cerr << "gen " << gen << ": CLEARED ARRIVAL PROCESS\n";

PRINT( "clearedArrivalProcess" );
getDemography()->dg_set_pop_size_by_name( gen, pop, nchroms_t( ToDouble( popSizeBeg + static_cast< popsize_float_t >( .5 ) ) ) );
}
getDemography()->dg_set_pop_size_by_name( gen, pop, nchroms_t( ToDouble( popSizeBeg + static_cast< popsize_float_t >( .5 ) ) ) );
}
return oldgen;
}
Expand Down
19 changes: 19 additions & 0 deletions doc/developers/DevelopersGuide.txt
Original file line number Diff line number Diff line change
Expand Up @@ -82,4 +82,23 @@ The distribution of the output should, of course, remain the same; but this comm
invalidate past test cases that checked for specific (random-seed,output) pairs.


Conditional defines that affect cosi compilation:

Defines that affect behavior:

COSI_RECOMB_CONSTRATE - if defined, optimizes cosi to support only constant
recombination rate. By default, cosi supports variation of recombination rate
across the region.

COSI_GC_DISABLE - if defined, disables support for gene conversion.

Defines that affect only CPU and memory requirements, but not the output:

COSI_IMPL_STORE_GLOC_IN_LOC - if defined, genetic map locations are stored with each physical location.
This allows constant-time mapping between the two, at the cost of extra memory usage; is useful when
the region is large and the recombination rate changes at many points in the region.

COSI_SLOWRECOMB - use the slower method of choosing recomb rates



Empty file modified tests/model001/t/updating.lck
100755 → 100644
Empty file.
Empty file modified tests/model001/t_u_0001/updating.lck
100755 → 100644
Empty file.
Binary file modified tests/model001/t_u_001/exact/x86_64-cygwin_nt-6.1_g++/coalescent
Binary file not shown.
Empty file modified tests/model001/t_u_001/updating.lck
100755 → 100644
Empty file.
Binary file modified tests/model002/t/exact/x86_64-cygwin_nt-6.1_g++/coalescent
Binary file not shown.
Empty file modified tests/model002/t/updating.lck
100755 → 100644
Empty file.
Binary file not shown.
Empty file modified tests/model002/t_u_0001/updating.lck
100755 → 100644
Empty file.
Binary file modified tests/model002/t_u_001/exact/x86_64-cygwin_nt-6.1_g++/coalescent
Binary file not shown.
Empty file modified tests/model002/t_u_001/updating.lck
100755 → 100644
Empty file.
Binary file modified tests/model003/t/exact/x86_64-cygwin_nt-6.1_g++/coalescent
Binary file not shown.
Empty file modified tests/model003/t/updating.lck
100755 → 100644
Empty file.
Binary file modified tests/model004/t/exact/x86_64-cygwin_nt-6.1_g++/coalescent
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
coalescent -I../.. -isystem ../../ext/boost -DNDEBUG -DCOSI_TRACK_LAST_COAL -DCOSI_SUPPORT_COALAPX -std=gnu++11 -O3 --param max-inline-insns-single=1000 -Werror -Wall -Wextra -Wno-unused-function -Wuninitialized -Wredundant-decls -Wpointer-arith -Wcast-qual -Wcast-align -Wformat -Woverloaded-virtual -Wstrict-null-sentinel -Wlogical-op -Wmissing-declarations -Wdouble-promotion -Winit-self -g -O2
libcosi -I../.. -isystem ../../ext/boost -DNDEBUG -DCOSI_TRACK_LAST_COAL -DCOSI_SUPPORT_COALAPX -std=gnu++11 -O3 --param max-inline-insns-single=1000 -Werror -Wall -Wextra -Wno-unused-function -Wuninitialized -Wredundant-decls -Wpointer-arith -Wcast-qual -Wcast-align -Wformat -Woverloaded-virtual -Wstrict-null-sentinel -Wlogical-op -Wmissing-declarations -Wdouble-promotion -Winit-self -g -O2
g++ (GCC) 4.9.3
coalescent -I../.. -isystem ../../ext/boost -DNDEBUG -DCOSI_TRACK_LAST_COAL -DCOSI_SUPPORT_COALAPX -std=gnu++11 -O3 -Werror -Wall -Wextra -Wdouble-promotion --param max-inline-insns-single=1000 -Wno-unused-function -Wuninitialized -Wredundant-decls -Wpointer-arith -Wcast-qual -Wcast-align -Wformat -Woverloaded-virtual -Wstrict-null-sentinel -Wlogical-op -Wmissing-declarations -Winit-self -g -O2
libcosi -I../.. -isystem ../../ext/boost -DNDEBUG -DCOSI_TRACK_LAST_COAL -DCOSI_SUPPORT_COALAPX -std=gnu++11 -O3 -Werror -Wall -Wextra -Wdouble-promotion --param max-inline-insns-single=1000 -Wno-unused-function -Wuninitialized -Wredundant-decls -Wpointer-arith -Wcast-qual -Wcast-align -Wformat -Woverloaded-virtual -Wstrict-null-sentinel -Wlogical-op -Wmissing-declarations -Winit-self -g -O2
g++ (GCC) 5.3.0
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Expand Down
Original file line number Diff line number Diff line change
@@ -1 +1 @@
$cosiBinary -p $paramFN -R $genMapFN -m -n 3 --seed 5217917146652479426
$cosiBinary -p $paramFN -R $genMapFN -m -n 3 --seed 8255675661226451025
Original file line number Diff line number Diff line change
@@ -1 +1 @@
e3d3501669a3f1a3fe862a5e7d0cf3dc653b6f85230cb5e71b13bc2c09e8ad5d737589d9c093892ef0993368f07fea74025bbe3dec23e39e234fde6727362414 *-
72bcec3e58d35eefb3c8f7022e5157fdb116797862ae64279159b8d4addb8d710d73bc76970c33d19a81cfdb723940da7cbb4dd09667677ee2a611b7281abbba *-
Binary file modified tests/model005/t/exact/x86_64-cygwin_nt-6.1_g++/coalescent
Binary file not shown.
Empty file modified tests/model005/t/updating.lck
100755 → 100644
Empty file.
Binary file modified tests/model006/t/exact/x86_64-cygwin_nt-6.1_g++/coalescent
Binary file not shown.
Empty file modified tests/model006/t/updating.lck
100755 → 100644
Empty file.
Binary file modified tests/model007/t/exact/x86_64-cygwin_nt-6.1_g++/coalescent
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
coalescent -I../.. -isystem ../../ext/boost -DNDEBUG -DCOSI_TRACK_LAST_COAL -DCOSI_SUPPORT_COALAPX -std=gnu++11 -O3 --param max-inline-insns-single=1000 -Werror -Wall -Wextra -Wno-unused-function -Wuninitialized -Wredundant-decls -Wpointer-arith -Wcast-qual -Wcast-align -Wformat -Woverloaded-virtual -Wstrict-null-sentinel -Wlogical-op -Wmissing-declarations -Wdouble-promotion -Winit-self -g -O2
libcosi -I../.. -isystem ../../ext/boost -DNDEBUG -DCOSI_TRACK_LAST_COAL -DCOSI_SUPPORT_COALAPX -std=gnu++11 -O3 --param max-inline-insns-single=1000 -Werror -Wall -Wextra -Wno-unused-function -Wuninitialized -Wredundant-decls -Wpointer-arith -Wcast-qual -Wcast-align -Wformat -Woverloaded-virtual -Wstrict-null-sentinel -Wlogical-op -Wmissing-declarations -Wdouble-promotion -Winit-self -g -O2
g++ (GCC) 4.9.3
coalescent -I../.. -isystem ../../ext/boost -DNDEBUG -DCOSI_TRACK_LAST_COAL -DCOSI_SUPPORT_COALAPX -std=gnu++11 -O3 -Werror -Wall -Wextra -Wdouble-promotion --param max-inline-insns-single=1000 -Wno-unused-function -Wuninitialized -Wredundant-decls -Wpointer-arith -Wcast-qual -Wcast-align -Wformat -Woverloaded-virtual -Wstrict-null-sentinel -Wlogical-op -Wmissing-declarations -Winit-self -g -O2
libcosi -I../.. -isystem ../../ext/boost -DNDEBUG -DCOSI_TRACK_LAST_COAL -DCOSI_SUPPORT_COALAPX -std=gnu++11 -O3 -Werror -Wall -Wextra -Wdouble-promotion --param max-inline-insns-single=1000 -Wno-unused-function -Wuninitialized -Wredundant-decls -Wpointer-arith -Wcast-qual -Wcast-align -Wformat -Woverloaded-virtual -Wstrict-null-sentinel -Wlogical-op -Wmissing-declarations -Winit-self -g -O2
g++ (GCC) 5.3.0
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Expand Down
Original file line number Diff line number Diff line change
@@ -1 +1 @@
$cosiBinary -p $paramFN -R $genMapFN -m -n 3 --seed 7757509784790892430
$cosiBinary -p $paramFN -R $genMapFN -m -n 3 --seed 2208906502262590667
Original file line number Diff line number Diff line change
@@ -1 +1 @@
24e5d59157dc5cb649c35e10263705b7ebefb4b5e99a972c210bf0c54e5f478ded9c49b79e24a67633655cab46cd8880618df7ba52d758266b2f4b6fffc4158a *-
d48e1bc04c97d3222bcf848323e761699625c3b99a6a2f3424a3ff9e31d5a8147c160d9d0ea3d16183e97713c5a9bd1fd0808c054a3dfaf657ff07a101a8d4ac *-
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
coalescent -I../.. -isystem ../../ext/boost -DNDEBUG -DCOSI_TRACK_LAST_COAL -DCOSI_SUPPORT_COALAPX -std=gnu++11 -O3 --param max-inline-insns-single=1000 -Werror -Wall -Wextra -Wno-unused-function -Wuninitialized -Wredundant-decls -Wpointer-arith -Wcast-qual -Wcast-align -Wformat -Woverloaded-virtual -Wstrict-null-sentinel -Wlogical-op -Wmissing-declarations -Wdouble-promotion -Winit-self -g -O2
libcosi -I../.. -isystem ../../ext/boost -DNDEBUG -DCOSI_TRACK_LAST_COAL -DCOSI_SUPPORT_COALAPX -std=gnu++11 -O3 --param max-inline-insns-single=1000 -Werror -Wall -Wextra -Wno-unused-function -Wuninitialized -Wredundant-decls -Wpointer-arith -Wcast-qual -Wcast-align -Wformat -Woverloaded-virtual -Wstrict-null-sentinel -Wlogical-op -Wmissing-declarations -Wdouble-promotion -Winit-self -g -O2
g++ (GCC) 4.9.3
coalescent -I../.. -isystem ../../ext/boost -DNDEBUG -DCOSI_TRACK_LAST_COAL -DCOSI_SUPPORT_COALAPX -std=gnu++11 -O3 -Werror -Wall -Wextra -Wdouble-promotion --param max-inline-insns-single=1000 -Wno-unused-function -Wuninitialized -Wredundant-decls -Wpointer-arith -Wcast-qual -Wcast-align -Wformat -Woverloaded-virtual -Wstrict-null-sentinel -Wlogical-op -Wmissing-declarations -Winit-self -g -O2
libcosi -I../.. -isystem ../../ext/boost -DNDEBUG -DCOSI_TRACK_LAST_COAL -DCOSI_SUPPORT_COALAPX -std=gnu++11 -O3 -Werror -Wall -Wextra -Wdouble-promotion --param max-inline-insns-single=1000 -Wno-unused-function -Wuninitialized -Wredundant-decls -Wpointer-arith -Wcast-qual -Wcast-align -Wformat -Woverloaded-virtual -Wstrict-null-sentinel -Wlogical-op -Wmissing-declarations -Winit-self -g -O2
g++ (GCC) 5.3.0
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Expand Down
Original file line number Diff line number Diff line change
@@ -1 +1 @@
$cosiBinary -p $paramFN -R $genMapFN -m -u .0001 -n 3 --seed 5706127987047471009
$cosiBinary -p $paramFN -R $genMapFN -m -u .0001 -n 3 --seed 3727857599688806079
Original file line number Diff line number Diff line change
@@ -1 +1 @@
1110c93eebfabf1c9f89a9d23b40037c4c00804ff266bf3ddd7aebd8265d38ef7b51764ab96c1c562478a77d4b5cbffc749e4c7fd98641c7adada1da322bfdea *-
0698840e369187c340936d48144f43efae540e66f6ff5c8a2c2785ce0bfba3fce4ed75046160c0c3714c56a95dcf823468aa488aded51ed501fb676219f9aa25 *-
Binary file not shown.
Loading

0 comments on commit aee5b87

Please sign in to comment.