Skip to content
/ LMSstat Public

Package for automation of statistics that are widely used in metabolomics.

License

Notifications You must be signed in to change notification settings

CHKim5/LMSstat

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LMSstat

Automation of statistical test with an identical data input aiming to reduce arduous work searching for packages and changing data input.

The package includes

  • Simple Statistics :u-test, t-test, post hocs of Anova and Kruskal Wallis with FDR adjusted values

  • Bar, Box, Dot plots with significance (u-test, t-test, post hocs of Anova and Kruskal Wallis)

  • Scaling & Transformation

  • Normality check (Shapiro Wilk test)

  • Heatmap

  • PERMANOVA

  • NMDS

  • PCA

Instructions

Installation

install.packages("devtools")

devtools::install_github("CHKim5/LMSstat")

library(LMSstat)

Basic structure of the Data

Used in

  • Simple statistics
  • Barplot, Boxplot, Dotplot
  • PERMANOVA
  • NMDS
  • PCA
  • Scaling & Transformation
  • Normality check (Shapiro Wilk test)
  • Heatmap
#Sample Data provided within the package

data("Data")

# Uploading your own Data

setwd("C:/Users/82102/Desktop")

Data<-read.csv("statT.csv",header = F)

The column "Multilevel" is mandatory for the code to run flawlessly.

If Multilevel is not used, fill the column with random characters

statT.csv

Used in

  • PERMANOVA
#Sample Data provided within the package
data("Classification")

# Uploading your own Data
Classification<-read.csv("statT_G.csv",header = F)

statT_G.csv

Univariate statistics

Statfile<-Allstats(Data,Adjust_p_value = T, Adjust_method = "BH")
Adjustable parameters
  • Adjust_p_value = T # Set True if adjustment is needed
  • Adjust_method = F # Adjustment methods frequently used. c("holm", "hochberg", "hommel", "bonferroni", "BH", "BY","fdr", "none")
head(Statfile[["Result"]]) # includes all statistical results

write.csv(Statfile[["Result"]],"p_value_result.csv")  # Write csv with all the p-value included

Plots

# Makes a subdirectory and saves boxplots for all the variables
AS_boxplot(Statfile,asterisk = "u_test") 

# Makes a subdirectory and saves dotplots for all the variables
AS_dotplot(Statfile,asterisk = "t_test") 

# Makes a subdirectory and saves barplots for all the variables
AS_barplot(Statfile,asterisk = "Scheffe") 

     AS_boxplot(Statfile)      AS_dotplot(Statfile)       AS_barplot(Statfile)

Adjustable parameters
  • asterisk = "t_test" #c("Dunn","Scheffe","u_test","t_test")
  • significant_variable_only = F # If set to TRUE, insignificant results will not be plotted
  • color = c("#FF3300", "#FF6600", "#FFCC00", "#99CC00", "#0066CC", "#660099") # Colors for the plots
  • legend_position = "none" # "none","left","right","bottom","top"
  • order = NULL # Order of the groups c("LAC","LUE","WEI","SDF","HGH","ASH")
  • tip_length = 0.01 # significance tip length
  • label_size = 2.88 # significance label size
  • step_increase = 0.05 #significance step increase
  • width = 0.3 # box width ; size = 3 # dot size
  • fig_width = NA #figure size
  • fig_height = NA #figure size

Scaling & Transformation

scaled_data<-D_tran(Data,param = "Pareto")
Adjustable parameters
  • param = "None" # "None","Auto","log10","Pareto"
  • save = F #Set true if datafile is to be saved

Normality check

Result<-Norm_test(Data)

write.csv(Result,"Normality_test_Result.csv")

Heatmap

scaled_data<-D_tran(Data,param = "Auto")

AS_heatmap(scaled_data) #data inheriting from D_tran

dev.off() # Saved as PDF

Adjustable parameters
  • col =c("green", "white", "red") # colors for heatmap
  • col_lim = c(-3, 0, 3) # color boundaries
  • reverse = T # T,F Reverse column and rows
  • distance = "pearson" # Distance matrix for HCA "pearson", "manhattan","euclidean","spearman","kendall" ,
  • rownames = T # T,F
  • colnames = T # T,F
  • Hsize = (3,6) # Width & Height c(a,b)
  • g_legend = "Group" # Annotation legend title
  • h_legend = "Color Key" # Heatmap legend title
  • Title ="Title" # Title
  • T_size = 10 # Title text size
  • R_size = 3 # row text size
  • C_size = 3 # column text size
  • Gcol =c("ASD" = "black","HGH"="red","LAC"="blue","LUE" ="grey","SDF" = "yellow","WEI"="green") # Color for top_annotation bar
  • dend_h = 0.5 #dendrite height
  • a_h = 0.2 # top annotation hegiht

Multivariate statistics

PERMANOVA

data("Data")

data("Classification") 

Single factor

PERMANOVA done with the Group column

Indiv_Perm(Data) # The group information is treated as a factor

Multiple Factors

Loops PERMANOVA over different classes provided by Classification

Result<-Multi_Perm(Data,Classification) # The group information is treated as factors

Adjustable parameters
  • method = Dissimilarity index c("manhattan", "euclidean", "canberra", "clark", "bray", "kulczynski", "jaccard", "gower", "altGower", "morisita", "horn", "mountford", "raup", "binomial", "chao", "cao", "mahalanobis", "chisq",chord")

NMDS

NMDS(Data,methods = c("manhattan","bray","euclidean"))

NMDS plot with bray distance and p-value from PERMANOVA

Adjustable parameters
  • methods = Dissimilarity index c("manhattan", "euclidean", "canberra", "clark", "bray", "kulczynski", "jaccard", "gower", "altGower", "morisita", "horn", "mountford", "raup", "binomial", "chao", "cao", "mahalanobis", "chisq",chord")

  • color = c("#FF3300", "#FF6600", "#FFCC00", "#99CC00", "#0066CC", "#660099") # Colors for the plots

  • legend_position = "none" # "none","left","right","bottom","top"

  • fig_width = NA #figure size

  • fig_height = NA #figure size

  • names = F # used to indicate sample names

  • dotsize = 3 # dotsize

  • labsize = 3 # label size

PCA

PCA(Data,components = c(1,2),legend_position = "none")

PCA plot with selected components

Adjustable parameters
  • color = c("#FF3300", "#FF6600", "#FFCC00", "#99CC00", "#0066CC", "#660099") # Colors for the plots
  • legend_position = "none" # "none","left","right","bottom","top"
  • fig_width = NA #figure size
  • fig_height = NA #figure size
  • components = c(1,2) # selected components
  • names = F # used to indicate sample names
  • dotsize = 3 # dotsize
  • labsize = 3 # label size