Skip to content

Commit

Permalink
FRQ is MAF check
Browse files Browse the repository at this point in the history
  • Loading branch information
Al-Murphy committed Sep 17, 2021
1 parent c806ee1 commit 6f8946b
Show file tree
Hide file tree
Showing 101 changed files with 513 additions and 123 deletions.
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: MungeSumstats
Type: Package
Title: Standardise summary statistics from GWAS
Version: 1.1.23
Version: 1.1.24
Authors@R:
c(person(given = "Alan",
family = "Murphy",
Expand Down
4 changes: 4 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,10 @@
frequency
* Mapping file now has mappings for allele frequency (AF) to FRQ
* VCF files with AF in INFO column e.g. 'AF=...' now converted to AF column
* `format_sumstats(frq_is_maf)` check added to infer if FRQ column values are
minor/effect allele frequencies or not. frq_is_maf allows users to rename the
FRQ column as MAJOR_ALLELE_FRQ if some values appear to be major allele
frequencies

## CHANGES IN VERSION 1.1.19

Expand Down
53 changes: 53 additions & 0 deletions R/check_frq_maf.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
#' Check that FRQ column refers to minor/effect allele frequency not major
#'
#' @inheritParams format_sumstats
#' @return sumstats_dt, the modified summary statistics data table object
#' @keywords internal
check_frq_maf <- function(sumstats_dt,frq_is_maf) {
## Set variables to be used in in place data.table functions to NULL
## to avoid confusing BiocCheck.
FRQ <- .N <- NULL
col_headers <- names(sumstats_dt)
if ("FRQ" %in% col_headers) {
#get proportion of SNPs with FRQ>0.5, this will mean major for
#bi-allelic SNPs but not major allele frq for non bi-allelic SNPs may be
#lower
num_major <- sumstats_dt[FRQ>0.5,.N,]
#only continue if there are some
if(num_major>0){
per_major <- round(num_major/nrow(sumstats_dt)* 100, 1)
#get mappings for message
frq_choices <-
paste0(sumstatsColHeaders[sumstatsColHeaders$Corrected=="FRQ",
]$Uncorrected,collapse=", ")
msg <-
paste0(formatC(num_major, big.mark = ",", format = "fg"),
" SNPs (",per_major,"%) have FRQ values > 0.5. ",
"Conventionally the FRQ column is intended to show the",
" minor/effect allele frequency.\nThe FRQ column was ",
" mapped from one of the following from the inputted ",
" summary statistics file:\n",frq_choices)
message(msg)
#if frq is minor allele frequency is set to TRUE just accept that it
#is and don't rename FRQ. If FALSE and there are some SNPS with
#FRQ>0.5 then do
if(isFALSE(frq_is_maf)){
msg <- paste0("As frq_is_maf=FALSE, the FRQ column will be ",
"renamed MAJOR_ALLELE_FRQ to differentiate the",
" values from \nminor/effect allele frequency.")
message(msg)
setnames(sumstats_dt,"FRQ","MAJOR_ALLELE_FRQ")
}
else{ #frq_is_maf =TRUE, i.e. don't rename
msg <- paste0("As frq_is_maf=TRUE, the FRQ column will not be ",
"renamed. If the FRQ values were intended to ",
"represent major allele frequency,\nset ",
"frq_is_maf=FALSE to rename the column as ",
"MAJOR_ALLELE_FRQ and differentiate it",
" from minor/effect allele frequency.")
message(msg)
}
}
}
return(sumstats_dt)
}
17 changes: 17 additions & 0 deletions R/format_sumstats.R
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,12 @@
#' "rs5772025_rs397784053". This can cause an error so by default, the first
#' RS ID will be kept and the rest removed e.g."rs5772025". If you want to just
#' remove these SNPs entirely, set it to TRUE. Default is FALSE.
#' @param frq_is_maf Conventionally the FRQ column is intended to show the
#' minor/effect allele frequency (MAF) but sometimes the major allele frequency
#' can be inferred as the FRQ column. This logical variable indicates that the
#' FRQ column should be renamed to MAJOR_ALLELE_FRQ if the frequency values
#' appear to relate to the major allele i.e. >0.5. By default this mapping won't
#' occur i.e. is TRUE.
#' @param sort_coordinates Whether to sort by coordinates.
#' @param nThread Number of threads to use for parallel processes.
#' @param save_path File path to save formatted data. Defaults to
Expand Down Expand Up @@ -181,6 +187,7 @@ format_sumstats <- function(path,
bi_allelic_filter = TRUE,
snp_ids_are_rs_ids = TRUE,
remove_multi_rs_snp = FALSE,
frq_is_maf = TRUE,
sort_coordinates = TRUE,
nThread = 1,
save_path = tempfile(fileext = ".tsv.gz"),
Expand Down Expand Up @@ -254,6 +261,8 @@ format_sumstats <- function(path,
allele_flip_frq = allele_flip_frq,
bi_allelic_filter = bi_allelic_filter,
snp_ids_are_rs_ids = snp_ids_are_rs_ids,
remove_multi_rs_snp = remove_multi_rs_snp,
frq_is_maf =frq_is_maf,
write_vcf = write_vcf,
return_format = return_format,
ldsc_format = ldsc_format,
Expand Down Expand Up @@ -775,6 +784,14 @@ format_sumstats <- function(path,
compute_n = compute_n,
imputation_ind = imputation_ind
)

#### Check 36: Ensure FRQ is MAF ####
sumstats_return$sumstats_dt <- check_frq_maf(
sumstats_dt =
sumstats_return$sumstats_dt,
frq_is_maf=frq_is_maf
)


#### Check 34: Perform liftover ####
sumstats_return$sumstats_dt <- liftover(
Expand Down
8 changes: 8 additions & 0 deletions R/validate_parameters.R
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,8 @@ validate_parameters <- function(path,
allele_flip_frq,
bi_allelic_filter,
snp_ids_are_rs_ids,
remove_multi_rs_snp,
frq_is_maf,
write_vcf,
return_format,
ldsc_format,
Expand Down Expand Up @@ -153,6 +155,12 @@ validate_parameters <- function(path,
if (!is.logical(snp_ids_are_rs_ids)) {
stop("snp_ids_are_rs_ids must be either TRUE or FALSE")
}
if (!is.logical(remove_multi_rs_snp)) {
stop("remove_multi_rs_snp must be either TRUE or FALSE")
}
if (!is.logical(frq_is_maf)) {
stop("frq_is_maf must be either TRUE or FALSE")
}
if (!is.logical(write_vcf)) {
stop("write_vcf must be either TRUE or FALSE")
}
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Standardise the format of GWAS summary statistics with *MungeSumstats*
================
Alan Murphy, Brian Schilder and Nathan Skene
2021-09-14
2021-09-17

<!-- Readme.md is generated from Readme.Rmd. Please edit that file -->
<!-- badges: start -->
Expand Down
2 changes: 1 addition & 1 deletion docs/404.html

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit 6f8946b

Please sign in to comment.