Skip to content

Commit

Permalink
Add Foregrounds To Simulated Data (#15)
Browse files Browse the repository at this point in the history
* Add analytic foreground from Hills et al. 2018 to both models (with and without signal)
* Add foreground parameter to the configuration file
* Reduce default noise to 0.015K due to signal now being harder to detect

* Allow polychord verification to be run in batches
* Increase live points for more challenging evidence calculation
* Remove nlike tracking as not used

* Add option for preprocessing simulated data in Evidence Network
* Add whitening transform preprocessing functions (Cholesky default) 
* Change the default network to a deeper and broader network 
* Clipnorm gradients for stability
* Fix overflow bug in blind coverage test
* Update default network hyperparameters for the new problem 

* Update verification data to version with foregrounds
* Update figure and results outputs  to version with foregrounds
* Stored old non-foreground figures for reference

* Extend additive combiner to support an arbitrary number of simulators

* Wording corrections and updated README

* Add scikit-learn to requirements
* Add anesthetic as a requirement
* Update to latest polychordlite version
  • Loading branch information
ThomasGesseyJones authored Apr 19, 2024
1 parent 67b4f6f commit c24fec3
Show file tree
Hide file tree
Showing 41 changed files with 983 additions and 386 deletions.
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
MIT License

Copyright (c) 2023 Thomas Gessey-Jones
Copyright (c) 2024 Thomas Gessey-Jones

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
35 changes: 20 additions & 15 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Overview

:Name: Fully Bayesian Forecast Example
:Author: Thomas Gessey-Jones
:Version: 0.1.3
:Version: 0.2.0
:Homepage: https://github.com/ThomasGesseyJones/FullyBayesianForecastsExample
:Letter: https://ui.adsabs.harvard.edu/abs/2023arXiv230906942G

Expand All @@ -31,7 +31,7 @@ reproducible analysis pipeline for the letter.

The overall goal of the code is to produce a fully Bayesian forecast of
the chance of a `REACH <https://ui.adsabs.harvard.edu/abs/2022NatAs...6..984D/abstract>`__-like experiment
making a significant detection of the 21-cm global signal, given a noise level. It also produces
making a significant detection of the 21-cm global signal from within foregrounds and noise. It also produces
figures showing how this conclusion changes with different astrophysical parameter values
and validates the forecast through blind coverage
tests and comparison to `PolyChord <https://ui.adsabs.harvard.edu/abs/2015MNRAS.453.4384H/abstract>`__.
Expand Down Expand Up @@ -74,17 +74,19 @@ There are three modules included in the repository:
that take a number of data simulations to run and return that number of mock data
simulations alongside the values of any parameters that were used in the
simulations. Submodules of this module define functions to generate specific
simulators for models with noise only and models with a noisy 21-cm global signal.
simulators for noise, foregrounds, and the 21-cm signal.

These three modules are used in the three analysis scripts:

- verification_with_polychord.py: This script generates a range of mock data
sets from both the noise-only model and the noisy-signal model, and then
sets from both the no-signal model and the with-signal model, and then
performs a Bayesian analysis on each of them.
Evaluating the Bayes ratio between the two models of the data
using Polychord. These results are then stored in the verification_data directory
for later comparison with the results from the evidence network to
verify its accuracy. It should be run first, ideally in parallel.
verify its accuracy. It should be run first, ideally with a large number of
versions in parallel as it is very computationally expensive but
splits simply into one task per data set.
- train_evidence_network.py: This script builds the evidence network object and
the data simulator functions, then trains the evidence network. Once trained
it stores the evidence network in the models directory, then runs a blind
Expand All @@ -106,40 +108,41 @@ scripts can be run from the terminal using the following commands:

.. code:: bash
python verification_with_polychord.py
python verification_with_polychord.py 0
python train_evidence_network.py
python visualize_forecasts.py
to run with the default noise level of 79 mK and replicate the
to run with the default noise level of 15 mK and replicate the
analysis from `Gessey-Jones et al. (2023) <https://ui.adsabs.harvard.edu/abs/2023arXiv230906942G>`__.
Alternatively you can pass
the scripts a command line argument to specify the experiments noise level in K. For example
to run with a noise level of 100 mK you would run the following commands:

.. code:: bash
python verification_with_polychord.py 0.1
python verification_with_polychord.py 0 0.1
python train_evidence_network.py 0.1
python visualize_forecasts.py 0.1
Two other files of interest are:

- fbf_utilities.py: which defines IO functions
needed by the three scripts and a utility function to assemble the data
simulators for the noise-only and noisy-signal model.
needed by the three scripts, utility functions to assemble the data
simulators for the noise-only and noisy-signal model, and standard
whitening transforms.
- configuration.yaml: which defines several parameters used in the code
including the experimental frequency resolution, the priors on the
astrophysical parameters of the global 21-cm signal model, and parameters
that control which astrophysical parameters are plotted in the forecast
figures. If you change the priors or resolution the entire pipeline
needs to be rerun to get accurate results.
astrophysical and foreground parameters, and the astrophysical parameters
which are plotted in the forecast figures. If you change the priors or
resolution the entire pipeline needs to be rerun to get accurate results.

The various figures produced in the analysis are stored in the
figures_and_results directory alongside the timing_data to assess the
performance of the methodology and some summary statistics of the evidence
networks performance. The figures and data generated in the
analysis for `Gessey-Jones et al. (2023) <https://ui.adsabs.harvard.edu/abs/2023arXiv230906942G>`__ are provided in this
repository for reference.
repository for reference, alongside the figures generated for an earlier
version of the letter which did not model foregrounds.

Licence and Citation
--------------------
Expand Down Expand Up @@ -194,6 +197,8 @@ To run the code you will need to following additional packages:
- `pypolychord <https://github.com/PolyChord/PolyChordLite>`__
- `scipy <https://pypi.org/project/scipy/>`__
- `mpi4py <https://pypi.org/project/mpi4py/>`__
- `scikit-learn <https://pypi.org/project/scikit-learn/>`__
- `anesthetic <https://pypi.org/project/anesthetic/>`__

The code was developed using python 3.8. It has not been tested on other versions
of python. Exact versions of the packages used in our analysis
Expand Down
100 changes: 66 additions & 34 deletions configuration.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,49 +21,81 @@ frequency_resolution: 1.0
# parameters of the prior. low and high are used in place of min and
# max to avoid clashing with python keywords.
priors:
f_star:
type: log_uniform
low: emu_min # If given emu_min or emu_max, the value is taken from the
high: emu_max # minimum or maximum value of GlobalEmu was trained on.
v_c:
type: log_uniform
low: emu_min
high: 30.0
f_x:
type: log_uniform
low: 0.001
high: emu_max
tau:
type: truncated_gaussian
mean: 0.054
std: 0.007
low: emu_min
high: emu_max
alpha:
type: uniform
low: emu_min
high: emu_max
nu_min:
type: log_uniform
low: emu_min
high: emu_max
R_mfp:
type: uniform
low: emu_min
high: emu_max
global_signal:
f_star:
type: log_uniform
low: emu_min # If given emu_min or emu_max, the value is taken from the
high: emu_max # minimum or maximum value of GlobalEmu was trained on.
v_c:
type: log_uniform
low: emu_min
high: 30.0
f_x:
type: log_uniform
low: 0.001
high: emu_max
tau:
type: truncated_gaussian
mean: 0.054
std: 0.007
low: emu_min
high: emu_max
alpha:
type: uniform
low: emu_min
high: emu_max
nu_min:
type: log_uniform
low: emu_min
high: emu_max
R_mfp:
type: uniform
low: emu_min
high: emu_max
foregrounds:
d0:
type: uniform
low: 1500.0 # K
high: 2000.0 # K
d1:
type: uniform
low: -1.0
high: 1.0
d2:
type: uniform
low: -0.05
high: 0.05
tau_e:
type: uniform
low: 0.005
high: 0.200
t_e:
type: uniform
low: 200.0 # K
high: 2000.0 # K
#
#
# PREPROCESSING
# =============
# Settings to control the preprocessing of the data before being fed into the
# neural network.
whitening_transform: 'Cholesky' # None, ZCA, PCA, Cholesky, ZCA-cor or PCA-cor
covariance_samples: 100_000 # Number of samples to use when calculating the
# covariance matrix for the whitening transform.
#
#
# VERIFICATION
# ============
# Number of data sets generate from each model to use when verifying the
# network against PolyChord. Each method is used to evaluate log K and then
# the results are compared.
# Number of data sets generated from each model to use when verifying the
# network against PolyChord. Evaluated in batches of fixed size due to
# HPC scheduling limitations.
verification_data_sets_per_model: 1000
verification_data_set_batch_size: 5
#
#
# PLOTTING
# ========
# Parameters that control details of the plots used to visualise the results.
# Parameters that control details of the plots used to visualize the results.
br_evaluations_for_forecast: 1000000
detection_thresholds: ["2 sigma", "3 sigma", "5 sigma"]
parameters_to_plot: ["f_star", "f_x", "tau"]
Expand Down
Loading

0 comments on commit c24fec3

Please sign in to comment.