Skip to content

Commit 101ad83

Browse files
DOC: How to construct config files (#454)
1 parent 2fb6364 commit 101ad83

File tree

3 files changed

+365
-0
lines changed

3 files changed

+365
-0
lines changed

docs/configuration_files.rst

Lines changed: 349 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,349 @@
1+
###################
2+
Configuration Files
3+
###################
4+
5+
This page outlines how to construct configuration files to run your own routines
6+
with `~skypy.pipeline.Pipeline`.
7+
8+
`SkyPy` is an astrophysical simulation pipeline tool that allows to define any
9+
arbitrary workflow and store data in table format. You may use `SkyPy` `~skypy.pipeline.Pipeline`
10+
to call any function --your own implementation, from any compatible external software or from the `SkyPy library`.
11+
Then `SkyPy` deals with the data dependencies and provides a library of functions to be used with it.
12+
13+
These guidelines start with an example using one of the `SkyPy` functions, and it follows
14+
the concrete YAML syntax necessary for you to write your own configuration files, beyond using `SkyPy`
15+
functions.
16+
17+
SkyPy example
18+
-------------
19+
20+
In this section, we exemplify how you can write a configuration file and use some of the `SkyPy` functions.
21+
In this example, we sample redshifts and magnitudes from the SkyPy luminosity function, `~skypy.galaxies.schechter_lf`.
22+
23+
- `Define variables`:
24+
25+
From the documentation, the parameters for the `~skypy.galaxies.schechter_lf` function are: ``redshift``, the characteristic absolute magnitude ``M_star``, the amplitude ``phi_star``, faint-end slope parameter ``alpha``,
26+
the magnitude limit ``magnitude_limit``, the fraction of sky ``sky_area``, ``cosmology`` and ``noise``.
27+
If you are planning to reuse some of these parameters, you can define them at the top-level of your configuration file.
28+
In our example, we are using ``Astropy`` linear and exponential models for the characteristic absolute magnitude and the amplitude, respectively.
29+
Also, ``noise`` is an optional parameter and you could use its default value by omitting its definition.
30+
31+
.. code:: yaml
32+
33+
cosmology: !astropy.cosmology.default_cosmology.get []
34+
z_range: !numpy.linspace [0, 2, 21]
35+
M_star: !astropy.modeling.models.Linear1D [-0.9, -20.4]
36+
phi_star: !astropy.modeling.models.Exponential1D [3e-3, -9.7]
37+
magnitude_limit: 23
38+
sky_area: 0.1 deg2
39+
40+
- `Tables and columns`:
41+
42+
You can create a table ``blue_galaxies`` and for now add the columns for redshift and magnitude (note here the ``schechter_lf`` returns a 2D object)
43+
44+
.. code:: yaml
45+
46+
tables:
47+
blue_galaxies:
48+
redshift, magnitude: !skypy.galaxies.schechter_lf
49+
redshift: $z_range
50+
M_star: $M_star
51+
phi_star: $phi_star
52+
alpha: -1.3
53+
m_lim: $magnitude_limit
54+
sky_area: $sky_area
55+
56+
`Important:` if cosmology is detected as a parameter but is not set, it automatically uses the cosmology variable defined at the top-level of the file.
57+
58+
This is how the entire configuration file looks like!
59+
60+
.. literalinclude:: luminosity.yml
61+
:language: yaml
62+
63+
You may now save it as ``luminosity.yml`` and run it using the `SkyPy` `~skypy.pipeline.Pipeline`:
64+
65+
.. plot::
66+
:include-source: true
67+
:context: close-figs
68+
69+
import matplotlib.pyplot as plt
70+
from skypy.pipeline import Pipeline
71+
72+
# Execute SkyPy luminosity pipeline
73+
pipeline = Pipeline.read("luminosity.yml")
74+
pipeline.execute()
75+
76+
# Blue population
77+
skypy_galaxies = pipeline['blue_galaxies']
78+
79+
# Plot histograms
80+
fig, axs = plt.subplots(1, 2, figsize=(9, 3))
81+
82+
axs[0].hist(skypy_galaxies['redshift'], bins=50, histtype='step', color='purple')
83+
axs[0].set_xlabel(r'$Redshift$')
84+
axs[0].set_ylabel(r'$\mathrm{N}$')
85+
axs[0].set_yscale('log')
86+
87+
axs[1].hist(skypy_galaxies['magnitude'], bins=50, histtype='step', color='green')
88+
axs[1].set_xlabel(r'$Magnitude$')
89+
axs[1].set_yscale('log')
90+
91+
plt.tight_layout()
92+
plt.show()
93+
94+
You can also run the pipeline directly from the command line and write the outputs to a fits file:
95+
96+
.. code-block:: bash
97+
98+
$ skypy luminosity.yml luminosity.fits
99+
100+
101+
102+
Don’t forget to check out for more complete examples_!
103+
104+
.. _examples: https://skypy.readthedocs.io/en/stable/examples/index.html
105+
106+
107+
YAML syntax
108+
-----------
109+
YAML_ is a file format designed to be readable by both computers and humans.
110+
Fundamentally, a file written in YAML consists of a set of key-value pairs.
111+
Each pair is written as ``key: value``, where whitespace after the ``:`` is optional.
112+
The hash character ``#`` denotes the start of a comment and all further text on that
113+
line is ignored by the parser.
114+
115+
116+
This guide introduces the main syntax of YAML relevant when writing
117+
a configuration file to use with ``SkyPy``. Essentially, it begins with
118+
definitions of individual variables at the top level, followed by the tables,
119+
and, within the table entries, the features of objects to simulate are included.
120+
Main keywords: ``parameters``, ``cosmology``, ``tables``.
121+
122+
123+
Variables
124+
^^^^^^^^^
125+
* `Variable definition`: a variable is defined as a key-value pair at the top of the file.
126+
YAML is able to interpret any numeric data with the appropriate type: integer, float, boolean.
127+
Similarly for lists and dictionaries.
128+
In addition, SkyPy has added extra functionality to interpret and store Astropy Quantities_.
129+
Everything else is stored as a string (with or without explicitly using quotes)
130+
131+
.. code:: yaml
132+
133+
# YAML interprets
134+
counter: 100 # An integer
135+
miles: 1000.0 # A floating point
136+
name: "Joy" # A string
137+
planet: Earth # Another string
138+
mylist: [ 'abc', 789, 2.0e3 ] # A list
139+
mydict: { 'fruit': 'orange', 'year': 2020 } # A dictionary
140+
141+
# SkyPy extra functionality
142+
angle: 10 deg
143+
distance: 300 kpc
144+
145+
146+
* `Import objects`:
147+
the SkyPy configuration syntax allows objects to be imported directly from external
148+
(sub)modules using the ``!`` tag and providing neither a list of arguments or a
149+
dictionary of keywords. For example, this enables the import and usage of any Astropy cosmology:
150+
151+
.. code:: yaml
152+
153+
cosmology: !astropy.cosmology.Planck13 # import the Planck13 object and bind it to the variable named "cosmology"
154+
155+
156+
Parameters
157+
^^^^^^^^^^
158+
159+
* `Parameters definition`: parameters are variables that can be modified at execution.
160+
161+
For example,
162+
163+
.. code:: yaml
164+
165+
parameters:
166+
hubble_constant: 70
167+
omega_matter: 0.3
168+
169+
170+
Functions
171+
^^^^^^^^^
172+
* `Function call`: functions are defined as tuples where the first entry is the fully qualified function name tagged with and exclamation mark ``!`` and the second entry is either a list of positional arguments or a dictionary of keyword arguments.
173+
174+
For example, if you need to call the ``log10()`` and ``linspace()`` NumPy_ functions, then you define the following key-value pairs:
175+
176+
.. code:: yaml
177+
178+
log_of_2: !numpy.log10 [2]
179+
myarray: !numpy.linspace [0, 2.5, 10]
180+
181+
You can also define parameters of functions with a dictionary of keyword arguments.
182+
Imagine you want to compute the comoving distance for a range of redshifts and an `Astropy` Planck 2015 cosmology.
183+
To run it with the `SkyPy` pipeline, call the function and define the parameters as an indented dictionary.
184+
185+
.. code:: yaml
186+
187+
comoving_distance: !astropy.cosmology.Planck15.comoving_distance
188+
z: !numpy.linspace [ 0, 1.3, 10 ]
189+
190+
Similarly, you can specify the functions arguments as a dictionary:
191+
192+
.. code:: yaml
193+
194+
comoving_distance: !astropy.cosmology.Planck15.comoving_distance
195+
z: !numpy.linspace {start: 0, stop: 1.3, num: 10}
196+
197+
`N.B.` To call a function with no arguments, you should pass an empty list of
198+
``args`` or an empty dictionary of ``kwargs``. For example:
199+
200+
.. code:: yaml
201+
202+
cosmo: !astropy.cosmology.default_cosmology.get []
203+
204+
205+
* `Variable reference`: variables can be referenced by their full name tagged with a dollar sign ``$``.
206+
In the previous example you could also define the variables at the top-level of the file and then reference them:
207+
208+
.. code:: yaml
209+
210+
redshift: !numpy.linspace [ 0, 1.3, 10 ]
211+
comoving_distance: !astropy.cosmology.Planck15.comoving_distance
212+
z: $redshift
213+
214+
* The `cosmology` to be used by functions within the pipeline only needs to be set up at the top. If a function needs ``cosmology`` as an input, you need not define it again, it is automatically detected.
215+
216+
For example, calculate the angular size of a galaxy with a given physical size, at a fixed redshift and for a given cosmology:
217+
218+
.. code:: yaml
219+
220+
cosmology: !astropy.cosmology.FlatLambdaCDM
221+
H0: 70
222+
Om0: 0.3
223+
size: !skypy.galaxies.morphology.angular_size
224+
physical_size: 10 kpc
225+
redshift: 0.2
226+
227+
* `Job completion`: ``.depends`` can be used to force any function call to wait for completion
228+
of any other job.
229+
230+
A simple example where, for some reason, the comoving distance needs to be called after
231+
completion of the angular size function:
232+
233+
.. code:: yaml
234+
235+
cosmology: !astropy.cosmology.Planck15
236+
size: !skypy.galaxies.morphology.angular_size
237+
physical_size: 10 kpc
238+
redshift: 0.2
239+
comoving_distance: !astropy.cosmology.Planck15.comoving_distance
240+
z: !numpy.linspace [ 0, 1.3, 10 ]
241+
.depends: size
242+
243+
By doing so, you force the function call ``redshift`` to be completed before is used to compute the comoving distance.
244+
245+
246+
Tables
247+
^^^^^^
248+
249+
* `Table creation`: a dictionary of table names, each resolving to a dictionary of column names for that table.
250+
251+
Let us create a table called ``telescope`` with a column to store the width of spectral lines that follow a normal distribution
252+
253+
.. code:: yaml
254+
255+
tables:
256+
telescope:
257+
spectral_lines: !scipy.stats.norm.rvs
258+
loc: 550
259+
scale: 1.6
260+
size: 100
261+
262+
* `Column addition`: you can add as many columns to a table as you need.
263+
Imagine you want to add a column for the telescope collecting surface
264+
265+
.. code:: yaml
266+
267+
tables:
268+
telescope:
269+
spectral_lines: !scipy.stats.norm.rvs
270+
loc: 550
271+
scale: 1.6
272+
size: 100
273+
collecting_surface: !numpy.random.uniform
274+
low: 6.9
275+
high: 7.1
276+
size: 100
277+
278+
* `Column reference`: columns in the pipeline can be referenced by their full name tagged with a dollar sign ``$``.
279+
Example: the galaxy mass that follows a lognormal distribution. You can create a table ``galaxies``
280+
with a column ``mass`` where you sample 10000 object and a second column, ``radius`` which also follows a lognormal distribution
281+
but the mean depends on how massive the galaxies are:
282+
283+
.. code:: yaml
284+
285+
tables:
286+
galaxies:
287+
mass: !numpy.random.lognormal
288+
mean: 5.
289+
size: 10000
290+
radius: !numpy.random.lognormal
291+
mean: $galaxies.mass
292+
293+
294+
* `Multi-column assignment`: multi-column assignment is performed with any 2d-array, where one of the dimensions is interpreted
295+
as the rows of the table and the second dimension, as separate columns. Or you can do it from a function that returns a tuple.
296+
297+
We use multi-column assignment in the following example where we sample a two-dimensional array of values from a lognormal distribution and then store them as three columns in a table:
298+
299+
.. code:: yaml
300+
301+
tables:
302+
halos:
303+
mass, radius, concentration: !numpy.random.lognormal
304+
size: [10000, 3]
305+
306+
307+
* `Table initialisation`: by default tables are initialised using ``astropy.table.Table()`` however this can be overridden using the ``.init`` keyword to initialise the table with any function call.
308+
309+
For example, you can stack galaxy properties such as radii and mass:
310+
311+
.. code:: yaml
312+
313+
radii: !numpy.logspace [ 1, 2, 100 ]
314+
mass: !numpy.logspace [ 9, 12, 100 ]
315+
tables:
316+
galaxies:
317+
.init: !astropy.table.vstack [[ $radii, $mass ]]
318+
319+
320+
* `Table reference`: when a function call depends on tables, you need to ensure the referenced table has the necessary content and is not empty.
321+
You can do that with ``.complete``.
322+
323+
Example: you want to perform a very simple abundance matching, i.e. painting galaxies within your halos.
324+
You can create two tables ``halos`` and ``galaxies`` storing the halo mass and galaxy luminosities.
325+
Then you can stack these two tables and store it in a third table called ``matching``.
326+
327+
.. code:: yaml
328+
329+
tables:
330+
halos:
331+
halo_mass: !numpy.random.uniform
332+
low: 1.0e8
333+
high: 1.0e14
334+
size: 20
335+
galaxies:
336+
luminosity: !numpy.random.uniform
337+
low: 0.05
338+
high: 10.0
339+
size: 20
340+
matching:
341+
.init: !astropy.table.hstack
342+
tables: [ $halos, $galaxies ]
343+
.depends: [ halos.complete, galaxies.complete ]
344+
345+
346+
.. _YAML: https://yaml.org
347+
.. _NumPy: https://numpy.org
348+
.. _Quantities: https://docs.astropy.org/en/stable/units/
349+
.. _clone(): https://docs.astropy.org/en/stable/api/astropy.cosmology.FLRW.html?highlight=clone#astropy.cosmology.FLRW.clone

docs/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ Getting Started
1818

1919
install
2020
feature_list
21+
configuration_files
2122
examples/index
2223

2324
.. _user-docs:

docs/luminosity.yml

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
cosmology: !astropy.cosmology.default_cosmology.get []
2+
z_range: !numpy.linspace [0, 2, 21]
3+
M_star: !astropy.modeling.models.Linear1D [-0.9, -20.4]
4+
phi_star: !astropy.modeling.models.Exponential1D [3e-3, -9.7]
5+
magnitude_limit: 23
6+
sky_area: 0.1 deg2
7+
tables:
8+
blue_galaxies:
9+
redshift, magnitude: !skypy.galaxies.schechter_lf
10+
redshift: $z_range
11+
M_star: $M_star
12+
phi_star: $phi_star
13+
alpha: -1.3
14+
m_lim: $magnitude_limit
15+
sky_area: $sky_area

0 commit comments

Comments
 (0)