Ignoring 'N', '-', and '*' in logos #17

shahpr · 2020-06-06T19:26:38Z

Is there a way to ignore certain characters or missing data when calculating EDLogos? Right now, any N in a DNA alignment shows up as a separate character in the Logos.

kkdey · 2020-06-14T16:09:50Z

I suggest you convert the input string vector into a matrix (PWM) format first and then run logomaker. Here is a demo, for dropping "A" from DNA sequence representation

library(Logolas)
sequence <- c("CTATTGT", "CTCTTAT", "CTATTAA", "CTATTTA", "CTATTAT",
"CTTGAAT", "CTTAGAT", "CTATTAA", "CTATTTA", "CTATTAT",
"CTTTTAT", "CTATAGT", "CTATTTT", "CTTATAT", "CTATATT",
"CTCATTT", "CTTATTT", "CAATAGT", "CATTTGA", "CTCTTAT",
"CTATTAT", "CTTTTAT", "CTATAAT", "CTTAGGT",
"CTATTGT", "CTCATGT", "CTATAGT", "CTCGTTA",
"CTAGAAT", "CAATGGT")
temp= Biostrings::consensusMatrix(sequence)
temp = temp[!(rownames(temp) %in% "A"),]
logomaker(temp, type = "EDLogo",
color_type = "per_row", colors = c("#ABDDA4","#FDAE61", "#2B83BA", "#D7191C"))

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ignoring 'N', '-', and '*' in logos #17

Ignoring 'N', '-', and '*' in logos #17

shahpr commented Jun 6, 2020

kkdey commented Jun 14, 2020

Ignoring 'N', '-', and '*' in logos #17

Ignoring 'N', '-', and '*' in logos #17

Comments

shahpr commented Jun 6, 2020

kkdey commented Jun 14, 2020