|
| 1 | +################### |
| 2 | +Configuration Files |
| 3 | +################### |
| 4 | + |
| 5 | +This page outlines how to construct configuration files to run your own routines |
| 6 | +with `~skypy.pipeline.Pipeline`. |
| 7 | + |
| 8 | +`SkyPy` is an astrophysical simulation pipeline tool that allows to define any |
| 9 | +arbitrary workflow and store data in table format. You may use `SkyPy` `~skypy.pipeline.Pipeline` |
| 10 | +to call any function --your own implementation, from any compatible external software or from the `SkyPy library`. |
| 11 | +Then `SkyPy` deals with the data dependencies and provides a library of functions to be used with it. |
| 12 | + |
| 13 | +These guidelines start with an example using one of the `SkyPy` functions, and it follows |
| 14 | +the concrete YAML syntax necessary for you to write your own configuration files, beyond using `SkyPy` |
| 15 | +functions. |
| 16 | + |
| 17 | +SkyPy example |
| 18 | +------------- |
| 19 | + |
| 20 | +In this section, we exemplify how you can write a configuration file and use some of the `SkyPy` functions. |
| 21 | +In this example, we sample redshifts and magnitudes from the SkyPy luminosity function, `~skypy.galaxies.schechter_lf`. |
| 22 | + |
| 23 | +- `Define variables`: |
| 24 | + |
| 25 | +From the documentation, the parameters for the `~skypy.galaxies.schechter_lf` function are: ``redshift``, the characteristic absolute magnitude ``M_star``, the amplitude ``phi_star``, faint-end slope parameter ``alpha``, |
| 26 | +the magnitude limit ``magnitude_limit``, the fraction of sky ``sky_area``, ``cosmology`` and ``noise``. |
| 27 | +If you are planning to reuse some of these parameters, you can define them at the top-level of your configuration file. |
| 28 | +In our example, we are using ``Astropy`` linear and exponential models for the characteristic absolute magnitude and the amplitude, respectively. |
| 29 | +Also, ``noise`` is an optional parameter and you could use its default value by omitting its definition. |
| 30 | + |
| 31 | + .. code:: yaml |
| 32 | +
|
| 33 | + cosmology: !astropy.cosmology.default_cosmology.get [] |
| 34 | + z_range: !numpy.linspace [0, 2, 21] |
| 35 | + M_star: !astropy.modeling.models.Linear1D [-0.9, -20.4] |
| 36 | + phi_star: !astropy.modeling.models.Exponential1D [3e-3, -9.7] |
| 37 | + magnitude_limit: 23 |
| 38 | + sky_area: 0.1 deg2 |
| 39 | +
|
| 40 | +- `Tables and columns`: |
| 41 | + |
| 42 | +You can create a table ``blue_galaxies`` and for now add the columns for redshift and magnitude (note here the ``schechter_lf`` returns a 2D object) |
| 43 | + |
| 44 | + .. code:: yaml |
| 45 | +
|
| 46 | + tables: |
| 47 | + blue_galaxies: |
| 48 | + redshift, magnitude: !skypy.galaxies.schechter_lf |
| 49 | + redshift: $z_range |
| 50 | + M_star: $M_star |
| 51 | + phi_star: $phi_star |
| 52 | + alpha: -1.3 |
| 53 | + m_lim: $magnitude_limit |
| 54 | + sky_area: $sky_area |
| 55 | +
|
| 56 | +`Important:` if cosmology is detected as a parameter but is not set, it automatically uses the cosmology variable defined at the top-level of the file. |
| 57 | + |
| 58 | +This is how the entire configuration file looks like! |
| 59 | + |
| 60 | +.. literalinclude:: luminosity.yml |
| 61 | + :language: yaml |
| 62 | + |
| 63 | +You may now save it as ``luminosity.yml`` and run it using the `SkyPy` `~skypy.pipeline.Pipeline`: |
| 64 | + |
| 65 | +.. plot:: |
| 66 | + :include-source: true |
| 67 | + :context: close-figs |
| 68 | + |
| 69 | + import matplotlib.pyplot as plt |
| 70 | + from skypy.pipeline import Pipeline |
| 71 | + |
| 72 | + # Execute SkyPy luminosity pipeline |
| 73 | + pipeline = Pipeline.read("luminosity.yml") |
| 74 | + pipeline.execute() |
| 75 | + |
| 76 | + # Blue population |
| 77 | + skypy_galaxies = pipeline['blue_galaxies'] |
| 78 | + |
| 79 | + # Plot histograms |
| 80 | + fig, axs = plt.subplots(1, 2, figsize=(9, 3)) |
| 81 | + |
| 82 | + axs[0].hist(skypy_galaxies['redshift'], bins=50, histtype='step', color='purple') |
| 83 | + axs[0].set_xlabel(r'$Redshift$') |
| 84 | + axs[0].set_ylabel(r'$\mathrm{N}$') |
| 85 | + axs[0].set_yscale('log') |
| 86 | + |
| 87 | + axs[1].hist(skypy_galaxies['magnitude'], bins=50, histtype='step', color='green') |
| 88 | + axs[1].set_xlabel(r'$Magnitude$') |
| 89 | + axs[1].set_yscale('log') |
| 90 | + |
| 91 | + plt.tight_layout() |
| 92 | + plt.show() |
| 93 | + |
| 94 | +You can also run the pipeline directly from the command line and write the outputs to a fits file: |
| 95 | + |
| 96 | +.. code-block:: bash |
| 97 | +
|
| 98 | + $ skypy luminosity.yml luminosity.fits |
| 99 | +
|
| 100 | +
|
| 101 | +
|
| 102 | +Don’t forget to check out for more complete examples_! |
| 103 | + |
| 104 | +.. _examples: https://skypy.readthedocs.io/en/stable/examples/index.html |
| 105 | + |
| 106 | + |
| 107 | +YAML syntax |
| 108 | +----------- |
| 109 | +YAML_ is a file format designed to be readable by both computers and humans. |
| 110 | +Fundamentally, a file written in YAML consists of a set of key-value pairs. |
| 111 | +Each pair is written as ``key: value``, where whitespace after the ``:`` is optional. |
| 112 | +The hash character ``#`` denotes the start of a comment and all further text on that |
| 113 | +line is ignored by the parser. |
| 114 | + |
| 115 | + |
| 116 | +This guide introduces the main syntax of YAML relevant when writing |
| 117 | +a configuration file to use with ``SkyPy``. Essentially, it begins with |
| 118 | +definitions of individual variables at the top level, followed by the tables, |
| 119 | +and, within the table entries, the features of objects to simulate are included. |
| 120 | +Main keywords: ``parameters``, ``cosmology``, ``tables``. |
| 121 | + |
| 122 | + |
| 123 | +Variables |
| 124 | +^^^^^^^^^ |
| 125 | +* `Variable definition`: a variable is defined as a key-value pair at the top of the file. |
| 126 | + YAML is able to interpret any numeric data with the appropriate type: integer, float, boolean. |
| 127 | + Similarly for lists and dictionaries. |
| 128 | + In addition, SkyPy has added extra functionality to interpret and store Astropy Quantities_. |
| 129 | + Everything else is stored as a string (with or without explicitly using quotes) |
| 130 | + |
| 131 | + .. code:: yaml |
| 132 | +
|
| 133 | + # YAML interprets |
| 134 | + counter: 100 # An integer |
| 135 | + miles: 1000.0 # A floating point |
| 136 | + name: "Joy" # A string |
| 137 | + planet: Earth # Another string |
| 138 | + mylist: [ 'abc', 789, 2.0e3 ] # A list |
| 139 | + mydict: { 'fruit': 'orange', 'year': 2020 } # A dictionary |
| 140 | +
|
| 141 | + # SkyPy extra functionality |
| 142 | + angle: 10 deg |
| 143 | + distance: 300 kpc |
| 144 | +
|
| 145 | +
|
| 146 | +* `Import objects`: |
| 147 | + the SkyPy configuration syntax allows objects to be imported directly from external |
| 148 | + (sub)modules using the ``!`` tag and providing neither a list of arguments or a |
| 149 | + dictionary of keywords. For example, this enables the import and usage of any Astropy cosmology: |
| 150 | + |
| 151 | + .. code:: yaml |
| 152 | +
|
| 153 | + cosmology: !astropy.cosmology.Planck13 # import the Planck13 object and bind it to the variable named "cosmology" |
| 154 | +
|
| 155 | +
|
| 156 | +Parameters |
| 157 | +^^^^^^^^^^ |
| 158 | + |
| 159 | +* `Parameters definition`: parameters are variables that can be modified at execution. |
| 160 | + |
| 161 | + For example, |
| 162 | + |
| 163 | + .. code:: yaml |
| 164 | +
|
| 165 | + parameters: |
| 166 | + hubble_constant: 70 |
| 167 | + omega_matter: 0.3 |
| 168 | +
|
| 169 | +
|
| 170 | +Functions |
| 171 | +^^^^^^^^^ |
| 172 | +* `Function call`: functions are defined as tuples where the first entry is the fully qualified function name tagged with and exclamation mark ``!`` and the second entry is either a list of positional arguments or a dictionary of keyword arguments. |
| 173 | + |
| 174 | + For example, if you need to call the ``log10()`` and ``linspace()`` NumPy_ functions, then you define the following key-value pairs: |
| 175 | + |
| 176 | + .. code:: yaml |
| 177 | +
|
| 178 | + log_of_2: !numpy.log10 [2] |
| 179 | + myarray: !numpy.linspace [0, 2.5, 10] |
| 180 | +
|
| 181 | + You can also define parameters of functions with a dictionary of keyword arguments. |
| 182 | + Imagine you want to compute the comoving distance for a range of redshifts and an `Astropy` Planck 2015 cosmology. |
| 183 | + To run it with the `SkyPy` pipeline, call the function and define the parameters as an indented dictionary. |
| 184 | + |
| 185 | + .. code:: yaml |
| 186 | +
|
| 187 | + comoving_distance: !astropy.cosmology.Planck15.comoving_distance |
| 188 | + z: !numpy.linspace [ 0, 1.3, 10 ] |
| 189 | +
|
| 190 | + Similarly, you can specify the functions arguments as a dictionary: |
| 191 | + |
| 192 | + .. code:: yaml |
| 193 | +
|
| 194 | + comoving_distance: !astropy.cosmology.Planck15.comoving_distance |
| 195 | + z: !numpy.linspace {start: 0, stop: 1.3, num: 10} |
| 196 | +
|
| 197 | + `N.B.` To call a function with no arguments, you should pass an empty list of |
| 198 | + ``args`` or an empty dictionary of ``kwargs``. For example: |
| 199 | + |
| 200 | + .. code:: yaml |
| 201 | +
|
| 202 | + cosmo: !astropy.cosmology.default_cosmology.get [] |
| 203 | +
|
| 204 | +
|
| 205 | +* `Variable reference`: variables can be referenced by their full name tagged with a dollar sign ``$``. |
| 206 | + In the previous example you could also define the variables at the top-level of the file and then reference them: |
| 207 | + |
| 208 | + .. code:: yaml |
| 209 | +
|
| 210 | + redshift: !numpy.linspace [ 0, 1.3, 10 ] |
| 211 | + comoving_distance: !astropy.cosmology.Planck15.comoving_distance |
| 212 | + z: $redshift |
| 213 | +
|
| 214 | +* The `cosmology` to be used by functions within the pipeline only needs to be set up at the top. If a function needs ``cosmology`` as an input, you need not define it again, it is automatically detected. |
| 215 | + |
| 216 | + For example, calculate the angular size of a galaxy with a given physical size, at a fixed redshift and for a given cosmology: |
| 217 | + |
| 218 | + .. code:: yaml |
| 219 | +
|
| 220 | + cosmology: !astropy.cosmology.FlatLambdaCDM |
| 221 | + H0: 70 |
| 222 | + Om0: 0.3 |
| 223 | + size: !skypy.galaxies.morphology.angular_size |
| 224 | + physical_size: 10 kpc |
| 225 | + redshift: 0.2 |
| 226 | +
|
| 227 | +* `Job completion`: ``.depends`` can be used to force any function call to wait for completion |
| 228 | + of any other job. |
| 229 | + |
| 230 | + A simple example where, for some reason, the comoving distance needs to be called after |
| 231 | + completion of the angular size function: |
| 232 | + |
| 233 | + .. code:: yaml |
| 234 | +
|
| 235 | + cosmology: !astropy.cosmology.Planck15 |
| 236 | + size: !skypy.galaxies.morphology.angular_size |
| 237 | + physical_size: 10 kpc |
| 238 | + redshift: 0.2 |
| 239 | + comoving_distance: !astropy.cosmology.Planck15.comoving_distance |
| 240 | + z: !numpy.linspace [ 0, 1.3, 10 ] |
| 241 | + .depends: size |
| 242 | +
|
| 243 | + By doing so, you force the function call ``redshift`` to be completed before is used to compute the comoving distance. |
| 244 | + |
| 245 | + |
| 246 | +Tables |
| 247 | +^^^^^^ |
| 248 | + |
| 249 | +* `Table creation`: a dictionary of table names, each resolving to a dictionary of column names for that table. |
| 250 | + |
| 251 | + Let us create a table called ``telescope`` with a column to store the width of spectral lines that follow a normal distribution |
| 252 | + |
| 253 | + .. code:: yaml |
| 254 | +
|
| 255 | + tables: |
| 256 | + telescope: |
| 257 | + spectral_lines: !scipy.stats.norm.rvs |
| 258 | + loc: 550 |
| 259 | + scale: 1.6 |
| 260 | + size: 100 |
| 261 | +
|
| 262 | +* `Column addition`: you can add as many columns to a table as you need. |
| 263 | + Imagine you want to add a column for the telescope collecting surface |
| 264 | + |
| 265 | + .. code:: yaml |
| 266 | +
|
| 267 | + tables: |
| 268 | + telescope: |
| 269 | + spectral_lines: !scipy.stats.norm.rvs |
| 270 | + loc: 550 |
| 271 | + scale: 1.6 |
| 272 | + size: 100 |
| 273 | + collecting_surface: !numpy.random.uniform |
| 274 | + low: 6.9 |
| 275 | + high: 7.1 |
| 276 | + size: 100 |
| 277 | +
|
| 278 | +* `Column reference`: columns in the pipeline can be referenced by their full name tagged with a dollar sign ``$``. |
| 279 | + Example: the galaxy mass that follows a lognormal distribution. You can create a table ``galaxies`` |
| 280 | + with a column ``mass`` where you sample 10000 object and a second column, ``radius`` which also follows a lognormal distribution |
| 281 | + but the mean depends on how massive the galaxies are: |
| 282 | + |
| 283 | + .. code:: yaml |
| 284 | +
|
| 285 | + tables: |
| 286 | + galaxies: |
| 287 | + mass: !numpy.random.lognormal |
| 288 | + mean: 5. |
| 289 | + size: 10000 |
| 290 | + radius: !numpy.random.lognormal |
| 291 | + mean: $galaxies.mass |
| 292 | +
|
| 293 | +
|
| 294 | +* `Multi-column assignment`: multi-column assignment is performed with any 2d-array, where one of the dimensions is interpreted |
| 295 | + as the rows of the table and the second dimension, as separate columns. Or you can do it from a function that returns a tuple. |
| 296 | + |
| 297 | + We use multi-column assignment in the following example where we sample a two-dimensional array of values from a lognormal distribution and then store them as three columns in a table: |
| 298 | + |
| 299 | + .. code:: yaml |
| 300 | +
|
| 301 | + tables: |
| 302 | + halos: |
| 303 | + mass, radius, concentration: !numpy.random.lognormal |
| 304 | + size: [10000, 3] |
| 305 | +
|
| 306 | +
|
| 307 | +* `Table initialisation`: by default tables are initialised using ``astropy.table.Table()`` however this can be overridden using the ``.init`` keyword to initialise the table with any function call. |
| 308 | + |
| 309 | + For example, you can stack galaxy properties such as radii and mass: |
| 310 | + |
| 311 | + .. code:: yaml |
| 312 | +
|
| 313 | + radii: !numpy.logspace [ 1, 2, 100 ] |
| 314 | + mass: !numpy.logspace [ 9, 12, 100 ] |
| 315 | + tables: |
| 316 | + galaxies: |
| 317 | + .init: !astropy.table.vstack [[ $radii, $mass ]] |
| 318 | +
|
| 319 | +
|
| 320 | +* `Table reference`: when a function call depends on tables, you need to ensure the referenced table has the necessary content and is not empty. |
| 321 | + You can do that with ``.complete``. |
| 322 | + |
| 323 | + Example: you want to perform a very simple abundance matching, i.e. painting galaxies within your halos. |
| 324 | + You can create two tables ``halos`` and ``galaxies`` storing the halo mass and galaxy luminosities. |
| 325 | + Then you can stack these two tables and store it in a third table called ``matching``. |
| 326 | + |
| 327 | + .. code:: yaml |
| 328 | +
|
| 329 | + tables: |
| 330 | + halos: |
| 331 | + halo_mass: !numpy.random.uniform |
| 332 | + low: 1.0e8 |
| 333 | + high: 1.0e14 |
| 334 | + size: 20 |
| 335 | + galaxies: |
| 336 | + luminosity: !numpy.random.uniform |
| 337 | + low: 0.05 |
| 338 | + high: 10.0 |
| 339 | + size: 20 |
| 340 | + matching: |
| 341 | + .init: !astropy.table.hstack |
| 342 | + tables: [ $halos, $galaxies ] |
| 343 | + .depends: [ halos.complete, galaxies.complete ] |
| 344 | +
|
| 345 | +
|
| 346 | +.. _YAML: https://yaml.org |
| 347 | +.. _NumPy: https://numpy.org |
| 348 | +.. _Quantities: https://docs.astropy.org/en/stable/units/ |
| 349 | +.. _clone(): https://docs.astropy.org/en/stable/api/astropy.cosmology.FLRW.html?highlight=clone#astropy.cosmology.FLRW.clone |
0 commit comments