.. glossary::
accession
A unique and stable PGS Catalog score identifier (ID). PGS Catalog IDs start
with the prefix PGS, e.g. `PGS000001`_
CSV
Comma-separated values, a popular plain text file format. `CSVs are
good`_. Please don't use ``.xlsx`` (Excel), it makes bioinformaticians
sad.
JSON
Javascript Object Notation. A popular file format and data interchange
format.
polygenic score
A `polygenic score`_ (PGS), aggregates the effects of many genetic variants
into a single number which quantifies an individual's genetic predisposition
for a phenotype. PGS are typically composed of hundreds-to-millions of genetic
variants (usually SNPs) which are calculated as a weighted sum of allele
dosages multiplied by their corresponding effect sizes. The variants and their effect sizes
are most often derived from a genome-wide association study (GWAS) using many
common software tools (including Pruning/Clumping + Thresholding (e.g. PRSice),
LDpred, lassosum, snpnet).
polygenic risk score
A polygenic risk score (PRS) is a subset of PGS that is used to estimate the
risk of disease or other clinically relevant outcomes (binary or discrete).
Also sometimes referred to as a genetic or genomic risk score (GRS).
PGS Catalog
The `Polygenic Score (PGS) Catalog`_ is an open database of published polygenic
scores (PGS). If you develop and publish polygenic scores, please consider
`submitting them`_ to the Catalog so they can be reused and applied to new
datasets using this pipeline!
PGS Catalog Calculator
``pgsc_calc`` - a reproducible workflow to calculate one or multiple PGS, implemented
in `Nextflow`_.
SNP
A `single nucleotide polymorphism`_ - most PGS only contain this type of variant
in addition to smaller common insertions/deletions (INDELS).
Scoring file
A file containing risk alleles and derived weights for a specific
phenotype. Weights are typically calculated with 1) GWAS summary
statistics and 2) A large population of people with known phenotypes
(e.g. the `UK BioBank`_). These files are distributed through the
PGS Catalog in a `standardized format`_, and also provided as
`harmonized scoring files`_ with consistently-reported positions in
common genome builds (GRCh37 and GRCh38). The pipeline
target dataset
Also referred to as a **sampleset** within the input samplesheets. The genomes/genotyping
data that you want to calculate polygenic scores for. Scores are calculated from an
existing scoring file that contains effect alleles and associated weights. These genomes
should distinct from those used to develop the polygenic score originally (i.e., those
used to derive the risk alleles and weights), as overlapping samples will inflate common
metrics of PGS accuracy.
VCF
Variant Call Format. A `standard file format`_ used to store genetic variants and genotypes.
By default the pipeline (& plink) use the sample genotypes present in the ``GT`` field. However,
users can import imputed ALT allele dosages by adding a ``DS`` flag to the ``vcf_genotype_field`` column of the
samplesheet, see :ref:`setup samplesheet` for more information.