-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathmanuscript.tex
699 lines (612 loc) · 40.5 KB
/
manuscript.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
\documentclass[iop]{emulateapj}
\usepackage[utf8]{inputenc}
\newcommand{\sick}{\texttt{sick}}
\newcommand{\article}{\textit{Article}}
\usepackage{amsmath}
\usepackage{bm}
\usepackage{natbib}
\usepackage{graphicx}
\begin{document}
\title{\sick, the spectroscopic inference crank}
\author{Andrew R. Casey\altaffilmark{1}}
\altaffiltext{1}{Institute of Astronomy, University of Cambridge, Madingley Road,
Cambdridge, CB3 0HA, United Kingdom; \email{arc@ast.cam.ac.uk}}
\begin{abstract}
In this \article{} I introduce \sick{}, the spectroscopic inference crank, an
open source probabilistic code for inferring astrophysical parameters from
spectra. \sick{} enables any user to easily construct an approximate
\textit{generative model} for spectral data, allowing for precise inference for
a host of astrophysical quantities. Model fluxes (or intensities) are approximated
by efficient multi-dimensional interpolation of pre-computed spectra. This allows
any user to capitalise on the plethora of published synthetic and observed spectral
libraries with minimal effort. Additional phenomena that transform the data
(e.g., redshift, continuum, smoothing) are incorporated as free parameters.
Outlier pixels (e.g., cosmic rays or poorly modelled regimes) can be treated with
a mixture model, and a simplistic noise model is included to account for
systematically underestimated variance. Combining these phenomena into a
scalar-justified, quantitative model allows for precise inference with credible
uncertainties. Commonly employed features are introduced, and the implementation
details are described. I demonstrate the utility of a generative model approach
with accurate and precise stellar photospheric measurements from noisy (e.g.,
S/N $\sim{} 7$) high- and low-resolution spectra. \sick{} is very easy to use,
well-tested, parallelised, and freely available online through GitHub under the
MIT license.
\end{abstract}
\section{Introduction}
Most of our understanding of astrophysics has been interpreted from spectra.
Given how informative spectroscopic data is to our understanding of astrophysics,
it is not surprising that the last decade has seen a substantial increase of
publicly accessible spectral data. Large scale surveys have driven this trend,
each releasing in excess of hundreds of thousands \citep[e.g.,][]{wigglez,boss,
segue,rave,gaia-eso} of spectra. Millions more spectra are expected in the
coming years \citep[e.g.,][]{lamost,galah}.
The spectra are obtained from different astrophysical sources to meet specific
scientific objectives, and the data vary in wavelength coverage, resolution, and
noise distributions. For these reasons many collaborations expend significant
resources to produce bespoke analysis software. Unfortunately this impedes
scientific reproducibility, as many codes still remain closed-source more than
a decade after the original article was published. Any comprehensive literature
comparison subsequently becomes impossible, as systematics can be difficult to
accurately characterise without in-depth knowledge of the methods or access to
the software.
Broadly speaking there are three types of methods employed for spectral analysis:
measuring the strengths of spectral features, pure data-generating models or
template-matching methods. Approaches that measure spectral features are
inexpensive, but regularly encounter problems with blended (often hidden) lines
or continuum placement. As such, some subjective interaction, tuning, or ad-hoc
`calibration' is almost always required. Data-generating methods compute model
spectra at run-time, and whilst accurate, can be prohibitively expensive for
large samples. For template-matching methods, synthetic spectra are only
generated once to produce a grid of fluxes or intensities for a subset of
permutations of astrophysical quantities. Although there are differences between
these methods, the preparatory steps are usually the same. Spectra are placed at
rest-frame by calculating line-of-sight velocities, typically by
cross-correlation, before being continuum-normalised to flux intensities.
Credible uncertainties can be difficult to discern from these approaches. The
implied assumptions of a $\chi^2$ distribution are often inapplicable, and
uncertainties in the doppler-shift, smoothing, and normalisation steps are
almost always ignored. These effects result in ill-characterised uncertainties
in astrophysical parameters. Alternatively the uncertainties are frequently
assumed to be approximately the same for all objects. This is an incorrect
approach: there are few, if any, examples of homoscedastic datasets in
astrophysics. The noise properties of each spectrum \textit{are} different, and
the parameter uncertainties (random and systematic) will differ for every object.
Consequently the uncertainties in astrophysical parameters by template-matching
methods are generally found to either be incorrectly assumed, under-estimated,
or at least ill-characterised.
In addition to affecting the uncertainties, the effects of redshift, continuum
normalisation and smoothing \textit{will} systematically bias the reported
maximum-likelihood parameters. Experienced spectroscopists will frequently
differ in their decision of continuum placement, even in the most
straightforward cases (e.g., metal-poor stars). For example, there are a number
of controversial examples within the literature where the subjective (human)
decision of continuum placement have significantly altered the scientific
conclusions \citep[e.g., see][where this issue is discussed in great
detail]{kerzendorf}. The implications of these phenomena \textit{must} be
considered if we are to understand subtle astrophysical processes. Spectroscopy
requires objectivity: one should endeavour to incorporate these phenomena as
free parameters into a generative model and infer them simultaneously with the
astrophysical parameters.
In this \article{} I present \sick{}: a well-tested, MIT-licensed probabilistic
software package for inference from spectroscopic data. \sick{} employs an
approximation to data-generating models. Instead of attempting to model all
kinds of expensive astrophysical processes (e.g., stars, supernova,
and any other interesting astrophysical processes) at run-time, it performs
efficient multi-dimensional interpolation of model spectra from pre-computed
grids and includes contributory phenomena (e.g., continuum, redshift) as free
parameters within an objective scalar-justified model. This approach is suitable
for a plethora of different astrophysical
processes, allowing any user to easily specify a model using existing published
spectral grids, and infer astrophysical properties from their data. Aspects of
the probabilistic model are described in Section \ref{sec:model}. A number of
examples are included in Sections \ref{sec:inference-test} and \ref{sec:examples},
where I present accurate and precise objective photospheric measurements using
noisy spectra. I conclude in Section \ref{sec:conclusions} with references to
the online documentation and applicability of the software.
\section{The Generative Model}
\label{sec:model}
The primary advantage of \sick{} is the simultaneous inference of both
astrophysical and other pertinent parameters that can alter the spectra. The
generative model described here is agnostic as to \textit{what} the
astrophysical parameters actually describe. Typical examples might be properties
of supernova (e.g., explosion energies and luminosities), galaxy characteristics
from integrated light, or mean plasma properties of a stellar photosphere. There
are a plethora of spectral libraries (observed and synthetic) published for these
types of applications \citep[e.g.,][]{snid,pegase,phoenix,pollux}. All of these
models are fully-\sick{} compatible. However given the research background of
the author, I will introduce the probabilistic model by focussing on stellar
photospheric inferences.
The first step of the (approximate\footnote{The term approximate is used because
model spectra are interpolated.}) generative model is to interpolate a model
spectrum. \sick{} employs the Quickhull algorithm \citep{quickhull} to linearly
interpolate in multiple dimensions. Irregularly-spaced grids are perfectly
acceptable; Quickhull does not require the grid to be rectangular or
regularly-spaced points. Large grids of model spectra are cached for
computational efficiency, allowing for the total size of the model grid to far
exceed the available random access memory.\footnote{In practice this is performed
by efficient memory-mapping.} I introduce an interpolation function $S(\bm{\psi})$
that produces a model spectrum with $J$ points $\{\lambda_j,I_j\}_{0}^{J}$ at
any evaluable point $\bm{\psi} \equiv (T_{\rm eff},\log{g},{\rm [M/H]},[\alpha/{\rm Fe}])$
within the grid boundaries.
\begin{equation}
S(\bm{\psi}) \rightarrow \{\lambda_j,I_j\}
\end{equation}
The model spectrum $S(\bm{\psi})$ should always be of higher spectral resolution
than the data. Therefore it is necessary to smooth (or broaden) $\{\lambda_j,I_j\}$
with a Gaussian filter of standard deviation $\sigma_{s}$ such that it matches
the spectral resolution of the data. For some applications it is tempting to
perform this step \textit{a priori} from the instrument spectral resolution.
However it is not necessarily true that the quoted instrumental resolution will
perfectly match the data. In truth the data have a \textit{distribution} of
spectral resolutions. Even if the spectral resolution is accurately known with
low variance, fixing this value will result in under-estimated uncertainties
$\sigma(\bm{\psi})$ and likely bias the posterior distribution distributions for
$\bm{\psi}$. Employing the free parameter $\sigma_{s}$ allows for the
identification of additional instrumental or atmospheric broadening. For stellar
applications, the free parameter $\sigma_{s}$ also aids in the identification of
unresolved binary companions.
The doppler shift of the source is treated by transforming the model spectrum
(in uniformly sampled $\log\lambda$ space). \sick{} solves for redshift $z$
(Equation \ref{eq:redshift}), and returns posterior distributions in the user's
preferred units (e.g., km s$^{-1}$).
\begin{equation}
\label{eq:redshift}
\lambda_{s} = \lambda_{j}(1 + z)
\end{equation}
After the model spectrum is smoothed and doppler shifted, model intensities are
required at the wavelengths $\lambda_i$ of the observed pixels. The complete
function to yield model intensities $M_i$ at wavelengths $\lambda_i$ is therefore
a function of the parameters of interest $\bm{\psi}$, redshift $z$, smoothing
kernel $\sigma_{s}$, and the wavelengths of the observed pixels $\lambda_i$:
\begin{equation}
M_{i} \equiv S\left(\bm{\psi},z,\sigma_{s},\lambda_i\right)
\end{equation}
In principle there is no reason to believe that the doppler-shifted, smoothed
model intensities $M_i$ should match the data at all. The counts and shape of
observed fluxes are a function of source magnitude, exposure time, instrument
sensitivities, atmospheric conditions, and a host of unaddressed effects. In
contrast the model spectra are calculated either as intensities (e.g., $I_j$)
or fluxes calibrated to an empirical system. Even for the true values of
$\bm{\psi}$, a function is required to normalise\footnote{The normalisation
process is frequently abused by stellar spectroscopists in the literature.
Wherever possible, data should not be transformed. One should seek to fit a
model to the data, not the other way around.} the model to the data. As such I
transform the \textit{model} intensities by some function $C$ to fit the data.
Although the function $C$ incorporates a number of effects (e.g., source
blackbody temperature, dust, instrument sensitivities), they are phenomena that
typically cannot be separated without additional information, and here I only
care about their combined effect. I only wish to ensure that the overall shape
of the data are accounted for, which is usually achievable with a low-order
polynomial
\begin{equation}
C_i = c_{0}\lambda_{i}^{m-1} + c_{1}\lambda_{i}^{m-2} + \dots + c_{m}
\end{equation}
\noindent{}with $m$ free coefficients $c_{0\dots{}m}$, where $m$ is specified by
the user. Other continuum functions are available in \sick{}, and the user is
encouraged to experiment for a specific problem. This allows us to express the
\textit{expected flux} $E_i$ at a given observed pixel with wavelength $\lambda_i$ as:
\begin{equation}
E_i = S_{i}\times{}C_i
\end{equation}
In many astrophysical cases the flux uncertainties $\sigma_{i}$ for a given pixel
are not well-characterised. This is often due to unpropagated uncertainties during
data reduction. Since the observed flux counts are expected to be Poisson-distributed,
in many astrophysical scenarios the pixel flux uncertainty can be made estimated
as $\sigma_i \sim 1/\sqrt{F_{i}}$. Knowing that this approximation may underestimate
the noise, I include an additional parameter to account for the possibility that
the variance in all pixels is underestimated by some fractional amount $f$
(Equation \ref{eq:noise-model}). While this crude noise model is probably
unrepresentative of the data in a large number of cases, including \textit{any}
noise model is preferable to none.
\begin{equation}
s_{i}^2 = \sigma_{i}^2 + f^{2}E_{i}^2
\label{eq:noise-model}
\end{equation}
We now have a \textit{generative model} for the data. The frequency (or
probability) distribution ${p\left(F_i|\lambda_i,\sigma_i,\bm{\psi},z,\sigma_s,\{c\}_{0}^{m}\right)}$
for the observed data $F_i$ is given by
\begin{equation}
p\left(F_i|\lambda_i,\sigma_{i},\bm{\psi},z,\sigma_{s},\{c\}_{0}^{m},f\right) =
\frac{1}{\sqrt{2\pi{}s_{i}^2}}\exp{\left(-\frac{\left[F_i - E_i\right]^2}{2s_{i}^2}\right)}
\label{eq:p_model}
\end{equation}
\noindent{}and with the implied assumption that the data are independently drawn,
the likelihood $\mathcal{L}$ is calculable by the product of individual probabilities:
\begin{equation}
\mathcal{L} = \prod_{i=1}^{N}\,p\left(F_i|\lambda_i,\sigma_{i},\bm{\psi},z,\sigma_{s},\{c\}_{0}^{m},f\right)
\end{equation}
From the original astrophysical parameters $\bm{\psi}$ that I care about, I now
have an additional $3 + m$ parameters to consider. The situation seemingly becomes
more complex when separately observed channels are considered. Many spectrographs
provide small portions of spectra (a channel, also colloquially expressed as beams
or apertures) separated by some large gap. The data for each channel are usually
captured by different CCDs and are therefore reduced separately.
The instrumental parameters $\sigma_{s}$, $\{c_k\}_{k=0}^{m}$, and $f$ are likely
to be different for each channel. Although redshift $z$ is an astrophysical effect
and should not differ between channels, this may not be true if each channel has
been wavelength-calibrated differently. It is therefore prudent to introduce
separate parameters $z$, $\sigma_{s}$, $\{c_k\}_{k=0}^{m}$, $f$ for \textit{each}
of the $N_{chan}$ observed channels. This scales the total dimensionality of the
model as $3\times{}N_{chan} + \sum_{k=0}^{N_{chan}}m_{k}$ in addition to $\bm{\psi}$.
The inclusion of all of these parameters is not mandatory: each \sick{} model can
be adjusted to include or ignore any combination of phenomena.
Finally, I consider the handling of outliers in the data. These may be in the form
of cosmic ray spikes, improper calibration of the data, telluric features, or simply
poorly modelled spectral regions. Treatment of these artefacts is achieved using a
Gaussian mixture model: a combination of two models. In a mixture model, the data
are fit by the sum of amplitudes ($1 - P_o$ and $P_o$, respectively) of two
distributions: the expected fluxes $E_i$, and a normal distribution centered along
the continuum function $C_i$ with variance $s_{i}^2 + V_{o}$. This requires the
inclusion of two additional parameters: $P_o$ and $V_o$. The prior
distribution function $p\left(V_o\right)$ requires $V_{o}$ to always be positive
(Equation \ref{eq:default_priors}), and as such the outlier distribution will
\textit{always} have a larger variance. Distributions of smaller variance are more
informative, so conceptually a fit to the expected fluxes $E_{i}$ is generally
preferred wherever possible. For brevity I define $\bm{\kappa} \equiv (\bm{\psi},\{z,\sigma_s,\{c_k\}_{k=0}^{m},f\}_{0}^{N_{c}})$, and the likelihood for the mixture model is given by
\begin{multline}
\mathcal{L} = \prod_{i=1}^{N}\,\left[\left(1 - P_{o}\right)\times{}p_{model}\left(F_i|\lambda_i,\sigma_{i},\bm{\kappa}\right)\right. \\
+ \left. P_{o}\times{}p_{outlier}\left(F_i|\lambda_i,\sigma_i,\bm{\kappa},V_{o},P_o\right)\right]
\end{multline}
\noindent{}where $p_{model}$ refers to $p$ in Equation \ref{eq:p_model} and
\begin{multline}
p_{outlier}\left(F_i|\lambda_i,\sigma_i,\bm{\kappa},V_{o},P_o\right) = \dots \\
\frac{1}{\sqrt{2\pi\left(s_{i}^2 + V_{o}\right)}} \exp\left(-\frac{[F_i - C_i]^2}{2\left[s_{i}^2 + V_{o}\right]}\right)
\end{multline}
\noindent{}such that the likelihood $\mathcal{L}$ becomes:
\begin{multline}
\mathcal{L} = \prod_{i=1}^{N} \left[ \frac{1-P_o}{\sqrt{2\pi{}s_{i}^2}}\exp\left(-\frac{[F_i - E_i]^2}{2s_{i}^{2}}\right) \right.\\
\left. + \frac{P_o}{\sqrt{2\pi\left[s_{i}^2 + V_o\right]}}\exp\left(-\frac{[F_i - C_i]^2}{2\left[s_{i}^{2} + V_o\right]}\right)\right]
\label{eq:full_likelihood}
\end{multline}
I define the full parameter space with ${\bm{\theta} \equiv \left(\bm{\psi},\{z,\sigma_s,\{c_{b_k}\}_{k=0}^{m},f\}_{b=0}^{N_{c}},V_o,P_o\right)}$. From Bayes theorem the posterior probability
distribution for $\bm{\theta}$ (up to a constant) is given by
\begin{eqnarray}
\mathcal{P} & \propto & likelihood \times prior \nonumber \\
p(\bm{\theta}|\{F_i\}_{i=1}^{N}) & \propto & p(\{F_i\}_{i=1}^{N}|\bm{\theta})\,\times\,p(\bm{\theta})
\label{eq:probability}
\end{eqnarray}
\noindent{}where $p(\bm{\theta}|\{F_i\}_{i=1}^{N})$ is the probability $\mathcal{P}$
of $\bm{\theta}$ given the data (and given the model: e.g., see Section \ref{sec:sun}),
$p(\{F_i\}_{i=1}^{N}|\bm{\theta})$ is our previously defined likelihood function
$\mathcal{L}$ (Equation \ref{eq:full_likelihood}), and $p(\bm{\theta})$ is the prior
probability distribution. Priors are discussed in more detail in Section \ref{sec:mcmc}.
% Log probability, expand function
%\begin{eqnarray}
%\log({\mathcal{P}}) & \propto & \log{likelihood} + \log{prior} \nonumber \\
%\log(\mathcal{P}) & \propto & \prod_{i=1}^{N} \left[ \frac{1-P_b}{\sqrt{2\pi\sigma_{i}^2}}\,\exp\,\left(-\frac{[F_i - E_i]^2}{2\sigma_{i}^{2}}\right) + \frac{P_b}{\sqrt{2\pi\left[\sigma_i^2 + V_o\right]}}\,\exp\,\left(-\frac{[F_i - C_i]^2}{2\left[\sigma_i^{2} + V_o\right]}\right)\right] \nonumber \\
%& & \dots + \ln(p(\bm{\theta}))
%\end{eqnarray}
With all of the combined effects there are 16 free parameters including $\bm{\psi}$,
the astrophysical quantities of interest. For the examples presented in Section
\ref{sec:examples}, this requires interpolation between $9.2 \times 10^{10}$ pixels in four
dimensions. While the description might appear daunting, the problem is tractable,
numerically efficient, and easy to configure. By default \sick{} numerically solves
this problem in three sequential steps, which are described in the following
sections: scattering, optimisation, and Monte-Carlo Markov Chain (MCMC) sampling.
\subsection{Initial Scattering}
\label{sec:scattering}
We seek to maximise $\mathcal{P}$ (or in practice, $\log\mathcal{P}$) and calculate
the posterior probability distributions for $\bm{\theta}$ given the data. In the
first step I calculate the log-probability $\log{(\mathcal{P})}$ for $N_{sample}$
randomly drawn points from all over the astrophysical parameter space $\bm{\psi}$.
Since \sick{} allows for arbitrarily large parameter spaces in $N_{D}$ dimensions,
initially sampling the parameter space $N_{sample}$ times provides a coarse
landscape of probability. If the probability distribution function were smoothly
distributed across all $\bm{\psi}$, or the optimisation step was sufficiently
robust for all potential applications, then in principle \textit{any} single
point would be an adequate starting guess. However there is no way of knowing
\textit{a priori} how smooth the posterior distribution will be for a given problem.
The $N_{samples}$ points are uniformly drawn in $\bm{\psi}$ by default, unless
priors are specified (see Section \ref{sec:mcmc}). A model spectrum $S(\bm{\psi})$
is interpolated for each point, which is used to estimate the redshift and continuum
parameters. After smoothing the model fluxes $I_j$ by $\sigma_s$ (as drawn from an
explicit prior, or estimated by $\left|\mathcal{N}\left(0, 1\right)\right|$) the
smoothed spectrum is cross-correlated with the data to yield a redshift estimate.
Continuum parameters $\{c_k\}_{k=0}^{m}$ are similarly estimated by fitting a
function (a polynomial in this case) to the data divided by the smoothed flux at
each $\lambda_i$. Estimating continuum and redshift provides us with a reasonable
guess of the probability for any randomly scattered point $\bm{\psi_i}$. If outlier
modelling is included in the model then $P_o$ is distributed as
$\mathcal{U}\left(0, 1\right)$ by default, and $V_o$ is estimated by
$\mathcal{N}\left(\widetilde{F_i}, \frac{1}{2}\widetilde{F_i}\right)$. This procedure
is \textit{only} employed for the initial random scattering stage: it is
\textit{not} applicable for the MCMC phase.
\subsection{Optimisation}
\label{sec:optimise}
After the random scattering step, the most probable $\bm{\theta}$ point is used
as an initial guess for numerical optimisation. A number of suitable minimisation
algorithms are available in \sick{} through the SciPy \citep{scipy} optimization
module. The \citet{nelder-mead} algorithm is employed by default.
Unlike other optimisation techniques, the Nelder-Mead algorithm
does not approximate first- or second-order derivatives, and can consequently be
less efficient than other approaches. However it is robust in high-dimensional
space, even in the presence of substantial noise, and has been successfully
employed in a wide range of engineering and scientific problems. The reader is
encouraged to experiment with other optimisation approaches if the Nelder-Mead
algorithm proves unsuitable or takes an untenable amount of time. Since these
optimisation algorithms are minimisation techniques and I seek to maximise the
log-probability $\log{\left(\mathcal{P}\right)}$, I numerically optimise the
parameters $\bm{\theta}$ by minimising the negative log-probability
$-\log{\left(\mathcal{P}\right)}$.
\subsection{Monte-Carlo Markov Chain Sampling}
\label{sec:mcmc}
The random scattering and numerical optimisation steps efficiently provide an
accurate estimate of the optimal parameters $\bm{\theta_{opt}}$. Once the optimisation
step is complete, \sick{} employs the affine-invariant ensemble sampler proposed
by \citet{goodman;weare}, and implemented by \citet{emcee}. The Metropolis-Hastings
MCMC algorithm is employed by default. \sick{} allows for the model settings to
be specified in a human-readable \textsc{yaml}- or \textsc{json}-formatted
configuration file\footnote{The reader is referred to the online documentation at http://astrowizici.st/sick/
for an example.}, where the number of Goodman \& Weare walkers can be specified,
as well as the number of samples to perform. When the optimisation step is used,
the initial points are taken from a small multi-dimensional ball around the
optimised parameters $\bm{\theta_{opt}}$. If the optimisation step is not
performed, initial values for the MCMC walkers are drawn in the same way as
described for the scattering process.
Priors represent our initial knowledge about a particular parameter before
looking at the data, and are necessary for any Bayesian analysis. A number of
different prior distributions can be specified by the user in the \sick{} model
configuration file. When no prior is explicitly specified, the following
uninformative prior distributions are assumed (for all channels, where appropriate):
\begin{eqnarray}
p\left(\bm{\psi}_{dim}\right) &\,=\,& \mathcal{U}\left(\min\left[\bm{\psi}_{dim}\right], \max\left[\bm{\psi}_{dim}\right]\right) \\
p\left(P_o\right) &\,=\,& \mathcal{U}\left(0, 1\right) \\
p\left(\log{f}\right) &\,=\,& \mathcal{U}\left(-10, 1\right) \\
p\left(V_o,\sigma_s\right) &\,=\,& \left\{
\begin{array}{c l}
1\,, &\mbox{for values greater than zero}\\
0\,, &\mbox{otherwise}
\end{array}\right. \\
p\left(z,\{c_k\}_{k=0}^{m}\right) &\,=\,& 1 \\
\label{eq:default_priors}
\end{eqnarray}
A consequence of allowing irregular model grids is that occasionally a model
spectrum cannot be interpolated for some values of $\bm{\psi}$, even if they
fall within $\left(\min\left[\bm{\psi}\right], \max\left[\bm{\psi}\right]\right)$.
In these cases $p\left(\bm{\psi}\right) = 0$ and thus $\log\left(\mathcal{P}\right) = -\infty$.
Quantitatively ensuring numerical convergence for MCMC analyses remains an
unsolved problem. There are a number of excellent resources on MCMC sampling
which outline this issue in greater detail. The mean acceptance fraction,
auto-correlation times, and values of the parameter chains themselves all
provide reasonable proxy indicators of convergence. The user is encouraged to
inspect these metrics and devise some heuristic for when convergence has
unambiguously been achieved. The final state of every MCMC analysis is
automatically saved by \sick{}, allowing users to resume their analysis from
the most recent state if they believe the system has not converged.
\subsection{Self-consistent inference test}
\label{sec:inference-test}
As an initial test of the probabilistic framework I have interpolated a
noise-free synthetic spectrum which will act as a faux observation. A single
channel from the AMBRE synthetic library \citep{ambre} has been employed. These
spectral channels range from 475\,nm to 685\,nm and were specifically calculated
by the AMBRE group for the Gaia-ESO Survey \citep{gaia-eso}. I have applied a number of transformations to the
interpolated spectrum: a second-order polynomial enters multiplicatively to
represent the continuum, the spectrum is redshifted, and fluxes are convolved
to a spectral resolution of $\mathcal{R} \sim 10000$. The faux data are resampled
to a uniform spacing of $\sim{}0.08$\,nm, effectively discarding $99.5\%$ of
the original pixels, which were sampled at $\sim4\times10^{-4}$\,nm. White noise
has been added to replicate a S/N ratio of $\sim 7$ pixel$^{-1}$. I also assumed
that the observed variance is systematically underestimated by 10 per cent:
conceptually the faux data is $\sim{}$10 per cent noisier than what an observer
would estimate from the flux counts.
\begin{figure*}
\label{fig:chains}
\includegraphics[height=\textheight]{figures/chains.pdf}
\caption{Points sampled by the 200 walkers at each step during the self-consistent
inference test. The true values are marked in blue. The first 1250 steps are
discarded as the burn-in period.}
\end{figure*}
\begin{figure*}
\label{fig:corner-inference}
\includegraphics[width=\textwidth,height=\textwidth]{figures/corner.pdf}
\caption{Marginalised posterior distributions for all parameters $\bm{\theta}$
from a faux observation with spectral resolution $\mathcal{R} \sim 10,000$ and
S/N ratio $\sim{}7$\,pixel$^{-1}$. True values are marked in blue. This figure
demonstrates that precise inference of stellar parameters can be made with
high-resolution spectra, even in the presence of substantial noise.}
\end{figure*}
\begin{table*}
\center
\caption{True and inferred model parameters for the self-consistent inference test}
\label{tab:inference-test}
\begin{tabular}{llrr}
\hline
\hline
Parameter & Description & Truth & Inferred \\
\hline
$T_{\rm eff}$ & Effective photospheric temperature (K) & 5454 & 5483$_{-137}^{+125}$ \\
$\log{}g$ & Surface gravity & 4.124 & 4.084$_{-0.274}^{+0.262}$ \\
${\rm [Fe/H]}$ & Metallicity & --0.514 & $-0.512_{-0.160}^{+0.156}$ \\
$[\alpha/{\rm Fe}]$ & $\alpha$-element abundance & $+0.02$ & $+0.05_{-0.12}^{+0.13}$ \\
$v$ & Velocity (km s$^{-1}$) & 13.0 & $9.5_{-5.5}^{+5.3}$ \\
$\sigma_{blue}$ & Gaussian smoothing sigma (\AA{}) & 0.581 & 0.518$_{-0.34}^{+0.39}$ \\
$\ln{f_{blue}}$ & Logarithm of fractionally underestimated variance & $-2.30$ & $-2.28_{-0.02}^{+0.02}$ \\
$b_{0}$ & Continuum polynomial coefficient ($\times10^{-3}$) & 1.23 & $1.26_{-0.32}^{+0.32}$ \\
$b_{1}$ & Continuum polynomial coefficient & $-0.593$ & $-1.001_{-3.634}^{+3.681}$ \\
$b_{2}$ & Continuum polynomial coefficient & $-0.569$ & $894_{-10335}^{+10367}$ \\
\hline
\end{tabular}
\end{table*}
\begin{figure*}
\label{fig:spectrum-inference}
\includegraphics[width=\textwidth]{figures/spectrum.pdf}
\caption{A faux observed spectrum (black) of $\mathcal{R} \sim 10000$ and S/N
ratio of $\sim7$ with an underestimated variance by $\sim$10\% which was used for
the self-consistent inference test. The recovered maximum likelihood model spectrum
is shown in red. }
\end{figure*}
Default priors and configurations were employed. 200 walkers were used to explore
the parameter space for 1250 steps as burn-in, and another 250 steps were used to
sample the posterior. The chains for each parameter are shown in Figure
\ref{fig:chains}. At least by eye, the chains appears to have converged after
just 300 steps. The marginalised posterior distributions are shown in Figure
\ref{fig:corner-inference}, with the accepted truth values marked. The true and
inferred values for all parameters are tabulated in Table \ref{tab:inference-test}.
This test demonstrates that accurate and precise inference of stellar parameters
can be made even in the presence of substantial noise. The true values of all
parameters are recovered within the uncertainties, and the largest (relative)
uncertainties are observed for continuum coefficients $b_1$ and $b_2$. Figure
\ref{fig:corner-inference} shows there are strong covariances between these
parameters. This is unsurprising given the shape of the spectrum (Figure
\ref{fig:spectrum-inference}), which could reasonably be fit with a linear
continuum, even though the continuum was truly represented by a third order
polynomial. The logarithm of additional fractional variance $\ln{f_{blue}}$ is
precisely constrained, suggesting the noise model is excellently matched by the
data. Given the simplicity of the noise model, the precision of inference on
$\ln{f_{blue}}$ is unlikely to be recovered for real data.
\section{Utility}
\label{sec:examples}
Here I include two realistic applications to demonstrate the capabilities and
ease-of-use of the software.
\subsection{Sol}
\label{sec:sun}
A high-resolution ($\mathcal{R} \sim$ 20000), high S/N ($\sim{}$150 pixel$^{-1}$)
twilight spectrum was obtained from the GIRAFFE/FLAMES
archive\footnote{eso.org/observing/dfo/quality/GIRAFFE/pipeline/solar.html}.
The H875.7 setup was chosen for this example, yielding spectra from 848.2\,nm to
899.2\,nm. Data points redder than 874\,nm have been discarded to more closely
match the Gaia RVS wavelength coverage, and demonstrate the recoverability of
stellar parameters on real data.
The AMBRE \citep{ambre} synthetic grid was also employed for this example. The
continuum was represented with a third order polynomial, and no redshift or
outlier modelling was included. The central $\pm$0.1 nm cores of the Ca II
near-infrared triplet lines were masked (specifically rest wavelengths
849.7-849.9\,nm, 854.1-854.3\,nm, and 866.1-866.3\,nm), as the cores are strongly
affected by non-LTE effects that are not accounted for in the 1D LTE model atmospheres
used to generate the AMBRE spectra. Pixels between rest wavelengths 850.10\,nm
to 850.37\,nm were also masked as the models did not seem to accurately reproduce
the data in this region.
\begin{figure*}
\label{fig:solar}
\includegraphics[width=\textwidth,height=\textwidth]{figures/solar.pdf}
\caption{Stellar parameter ($\bm{\psi}$: $T_{\rm eff}$, $\log{g}$, [Fe/H],
[$\alpha$/Fe]) posterior distributions for the GIRAFFE/FLAMES twilight spectrum.
Nuisance parameters are not shown. Maximum likelihood values for each parameter
are drawn in red, and the 16\%, 50\% and 84\% quantiles are shown as dashed lines.
The posterior distributions are precise, and close to the accepted values (not
shown) of $T_{\rm eff} = 5777$\,K, $\log{g} = 4.445$, slightly discrepant from
[$\alpha$/Fe] = 0, and significantly deviant from [Fe/H] = 0. This test
illustrates the precision achievable using \sick, but highlights the need for
accurate models.}
\end{figure*}
For this example 1000 draws were made during the random scattering stage, and
optimisation was performed using the default options. One hundred walkers
explored the parameter space for 1000 steps, before another 200 steps were used
to sample the posterior. The posterior distributions for stellar parameters is
shown in Figure \ref{fig:solar}. The currently accepted stellar parameters for
the Sun are $T_{\rm eff} = 5777$\,K, $\log{g} = 4.445$, [Fe/H] = 0, and
[$\alpha$/Fe] = 0. The inferred parameters agree excellently with the accepted
values for effective temperature and surface gravity:
$T_{{\rm eff},{\rm inferred}} = 5770^{+20}_{-14}$, and $\log{g}_{\rm inferred} = 4.40^{+0.06}_{-0.05}$.
The agreement is less pleasing for $\alpha$-element enhancement, and worse for
mean metallicity: [$\alpha$/Fe]$_{\rm inferred} = -0.03^{+0.01}_{-0.01}$, and
[Fe/H]$_{\rm inferred} = -0.07^{+0.01}_{-0.01}$.
While our inferences are precise, the offset in mean metallicity suggest the
models may be slightly inaccurate in this wavelength regime, or the solar abundances
employed are slightly discrepant, or perhaps there are
unaccounted telluric features in the data. This test highlights the need to
verify the accuracy of the input models. A useful extension of this project
would be to simultaneously include telluric spectra, as well as pre-computed
astereoseismic models, such that spectroscopy and time-series photometry could
simultaneously constrain stellar parameters. Asteroseismology can accurately and
precisely constrain surface gravity, but requires an input effective temperature.
Conversely surface gravity is the least constrained stellar parameter by spectroscopy,
but can precisely constrain effective temperature, and is the only method available
to measure chemical abundances. Since \sick{} allows for any dispersion unit
equivalency (e.g., frequency, wavelength, et cetera), spectroscopic and astereoseismic
models can be trivially coupled into a generative framework for unprecedented
accuracy and precision in stellar spectroscopy.
\subsection{Globular Clusters}
Globular cluster stars are excellent candidates for testing \sick{} over a large
range of model parameters. Here I consider a realistic scenario of
low-resolution, noisy spectra of red giant and asymptotic branch cluster
candidates observed using the AAOmega spectrograph on the Anglo-Australian
Telescope in Coonabarabran, Australia. The 580V and 1700D gratings were employed,
yielding one channel between 370\,nm to 580\,nm with a spectral resolution
$\mathcal{R} \sim 1300$, and another from 820\,nm to 890\,nm with spectral
resolution $\mathcal{R} \sim 8000$.
Although these data are generally considered to be of low spectral resolution,
I show that astrophysical parameters can be accurately inferred with high
precision. The AMBRE synthetic grid has also been employed for this example.
Each observed channel was treated identically: individual doppler shifts were
permitted in each channel, and the continuum was represented as a second-order
polynomial. Outlier modelling was included.
The parameter space was randomly sampled 1000 times before numerical optimisation
was performed using the Nelder-Mead algorithm. 200 walkers explored the parameter
space for a burn-in period of 450 steps, before sampling the posterior another
50 times. The auto-correlation times, sampled chain values, and mean acceptance
fractions demonstrated that convergence was unambiguously achieved for all
analysed sources.
Radial velocities were used to identify and discard non-cluster members. The
metallicity distribution of the remaining cluster stars are shown in Figure
\ref{fig:clusters}, where a comparison to the literature is made. The
\citet{harris} catalogue (accessed June 2014) has been used as a literature
reference for all globular clusters. Note that although there are measurable
intrinsic metallicity spreads and active debates on the mean abundances
of these clusters, I have not adopted abscissa uncertainties.
\begin{figure}[h!]
\label{fig:clusters}
\includegraphics[width=0.5\textwidth]{figures/clusters.pdf}
\caption{Metallicities of cluster stars inferred from noisy, low-resolution
spectra obtained with the AAOmega spectrograph. The agreement with the
literature values (see text) is very reasonable.}
\end{figure}
Even ignoring uncertainties in the literature values, the agreement between is
exceptional for noisy, low-resolution spectra. No systematic trend is present
and a negligible mean metallicity difference of $-0.001$ dex (an accuracy of $<$0.2\%)
is observed over three orders of magnitude. The agreement would be remarkable
for high-resolution spectra with exquisite S/N ratios. I emphasise that these
results were obtained without performing any `calibration' or ad-hoc posterior
corrections, which are usually entirely empirical and not motivated by an
astrophysical understanding.
Individual uncertainties in stellar parameters are also extremely reasonable,
particularly given the low spectral resolution and S/N of the data. Uncertainties
of $\sim$75 K in $T_{\rm eff}$, $\sim$0.2 in $\log{g}$, $\sim$0.1 dex in [Fe/H]
and $\sim$0.05 dex in [$\alpha$/Fe] are obtained in the presence of substantial
noise (e.g., $S/N \sim 15$). The most uncertain results were obtained for
{NGC\,7089}, where the lowest S/N ratios were achieved and only six cluster members were
observed.
\section{Conclusion}
\label{sec:conclusions}
A probabilistic code has been introduced to approximately forward model
spectroscopic data, and thereby allow for simultaneous inference of astrophysical
and nuisance parameters. The generative model approach described here has a number
of advantages over previously published techniques. Preparatory and subjective
decisions (e.g., redshift and placement of continuum) are objectively treated
within a scalar-justified mathematical model. This allows users to credibly
characterise the parameter posterior distributions and understand how they affect
astrophysical measurements. Until now most (if not all) published techniques treat
these processes separately, thereby systematically biasing and generally
under-estimating the parameter uncertainties.
With applications to both high- and low-resolution stellar spectra, I have
demonstrated that the simultaneous incorporation of these effects leads to
remarkable improvements in both accuracy and precision. While the examples
presented here have focussed on stellar spectra, the code is ambivalent about
\textit{what} the astrophysical parameters describe: the framework can be easily
used for any kind of quantifiable astrophysical process. The code is MIT-licensed
(freely distributable, open source) and has an extensive automatic testing suite,
which includes regular reproduction of the examples presented in this \article{}.
Complete documentation is available online\footnote{astrowizici.st/sick},
which includes a number of additional examples and tutorials. In the spirit of
scientific reproducibility, the online documentation is complemented with the
files necessary to reproduce examples presented in this \article{}.
The source code is distributed using \textsc{git}, and hosted online at
GitHub\footnote{github.com/andycasey/sick}.
Great care has been spent to ensure the code is easy to use, and allow users to
obtain precise inferences with little effort. I strongly encourage the use of this
software for existing and future spectral datasets. With the sheer volume and
high-quality of spectral data available, astronomers must begin to adopt objective,
generative models for their data. Subtle astrophysical processes can only be
discovered and understood with the proper characterisation of uncertainties
afforded by generative models.
\acknowledgements
I am pleased to thank Louise Howes for extensive testing of this code, Martin
Asplund for providing commentary, as well as Sergey Koposov and Jarryd Page for
constructive discussions on this work. This research has made extensive use of NASA's
Astrophysics Data System Bibliographic Services, the Coveralls continuous
integration service, GitHub, and the \textsc{triangle.py} code \citep{triangle.py}.
This research has been funded by European Research Council grant 320360:
The Gaia-ESO Milky Way Survey.
\bibliographystyle{apj}
\bibliography{biblio}
\end{document}