Plot significances on graphs

Objective

While working with scientific data, I always have to compute and plot significant differences between two or more datasets. While computing pvalues is pretty easy, plotting them on a graph has always been a tedious process. I have to provide information in the form of annotation to draw lines between bars and add pvalues on top of those lines. I looked for R packages that can do this automatically. Though there are some, but the user has to explicitly input the dataset names for which pvalues have to be plotted.

Since none of the packages could satisfy my needs and as I plot graphs and significances on a daily basis, I therefore decided to write a function that could do this automatically.

System requirements

R
RStudio

Install tidyverse package and attach it as described below.

install.packages("tidyverse", repos = "https://cran.ma.imperial.ac.uk/")
library(tidyverse)

Function usage

I would like to explain the usage of the function with an example.

In my repository, I have provided a sample dataset in "SampleData.csv" file. Import it and modify the data types.

growth.drug.test <- read_csv(file = "SampleData.csv", col_names = TRUE)
str(growth.drug.test)
growth.drug.test <- growth.drug.test %>%
    mutate(Subjects = as.factor(Subjects)) %>%
    mutate(Treatment = as.factor(Treatment))

head(growth.drug.test)

## # A tibble: 6 x 3
##   Subjects Treatment Growth
##   <fct>    <fct>      <dbl>
## 1 Children DrugA      10.5 
## 2 Children DrugB      19.4 
## 3 Children DrugC      29.0 
## 4 Children DrugA       9.18
## 5 Children DrugB      18.8 
## 6 Children DrugC      28.3

This object has three columns:

Subjects
- A categorical variable with four levels
Treatment
- A categorical variable with three levels
Growth
- A numeric variable

The function to plot significances requires the following arguments to be fulfilled:

dataset
- a dataframe with only two columns: first should be categorical variable and the second should be numeric varaible
error.bars
- specify "sd" for standard deviation or "sem" for standard error of the mean
type
- specify "t.test" to perform ttest or "wilcox.text" to perform wilcox test
alternative
- specify "two.sided" or "greater" or "less"
conf.level
- provide a value between 0 and 1. Example: for a 95% confidence level, use 0.95
title
- provide a title for the chart enclosed in double quotes
subtitle
- provide a subtitle for the chart enclosed in double quotes
xlabel
- provide a label for X-axis enlcosed in double quotes
ylabel
- provide a label for Y-axis enclosed in double quotes
legend.title
- provide a title for the legend enclosed in double quotes

Import the function into your environment

Note: Use this function only to draw bar charts. Will not work with other chart types.

Example 1:

To view the Growth among different Subjects, irrespective of the drug taken:

plotSignificances(dataset = growth.drug.test[c(1,3)],
                  error.bars = "sem",
                  type = "t.test",
                  alternative = "two.sided",
                  conf.level = 0.95,
                  title = "Growth among different subjects",
                  subtitle = "Source: SampleData.csv",
                  xlabel = "Subjects",
                  ylabel = "Growth",
                  legend.title = "Subjects"
                  )

Example 2:

To view the Growth with different Treatment drugs, irrespective of the Subjects tested:

plotSignificances(dataset = growth.drug.test[c(2,3)],
                  error.bars = "sem",
                  type = "t.test",
                  alternative = "two.sided",
                  conf.level = 0.95,
                  title = "Growth with different drugs",
                  subtitle = "Source: SampleData.csv",
                  xlabel = "Treatment",
                  ylabel = "Growth",
                  legend.title = "Drugs Tested"
                  )

Example 3:

As mentioned earlier, the function will only work with datasets that have one categorical variable and one numeric variable. It cannot plot significances on graphs (shown below) created from more than one categorical variables and one numeric variable.

ggplot(data = growth.drug.test,
       aes(x = Subjects, y = Growth, group = Treatment, fill = Treatment)) +
    geom_bar(stat = "identity", position = "dodge") + 
    theme_classic() +
    labs(title = "Graph from two categorical variables and one numeric variable",
         subtitle = "Source: SampleData.csv")

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Readme_files/figure-markdown_github		Readme_files/figure-markdown_github
LICENSE		LICENSE
PlotSignificances.R		PlotSignificances.R
PlotSignificancesProject.Rproj		PlotSignificancesProject.Rproj
Readme.md		Readme.md
SampleData.csv		SampleData.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Plot significances on graphs

Objective

System requirements

Function usage

Example 1:

Example 2:

Example 3:

About

Uh oh!

Releases

Packages

Languages

License

SunilVeeravalli/PlotSignificances

Folders and files

Latest commit

History

Repository files navigation

Plot significances on graphs

Objective

System requirements

Function usage

Example 1:

Example 2:

Example 3:

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages