Data concentration plot #36

JohnUrban · 2014-10-14T22:22:26Z

Hi Aaron,

If you are interested, I added a new subtool that returns the type of plot one sees in MinKNOW during the sequencing run with the sum of data over each bin.

poretools/poretool_main.py was edited to include the subcommand "data_conc" and poretools/dataconc.py was added to the repertoire.

dataconc.py uses the matplotlib and pandas libraries. I kept with the structure and lingo you used in hist.py. data_conc can actually write to any image extension type that matplotlib allows, but I arbitrarily restricted it to pdf and jpg to avoid errors thrown from erroneous extensions.

best,

John

…t since it shows the read length neighborhood where most data is concentrated if such a neighborhood exists (they exist for pacbio reads, but the plot seems more uniform with minION reads). This is the type of plot one sees in MinKNOW during the sequencing run. poretools/poretool_main.py was edited to include this subcommand and poretools/dataconc.py was added to the repertoire. dataconc.py uses the matplotlib and pandas libraries and is a surprisingly simple few lines of code.

… data as a percent of all data. The other is --cumulative which plots the cumulative data with increasing read length. --percent and --cumulative can be used together as well.

JohnUrban · 2014-10-15T00:22:54Z

I actually just updated data_conc to allow plotting the cumulative amount of data with increasing read length as well as to allow both the regular and cumulative plots as percents of total data instead of absolute amounts (in bp) of data. These options at the command line are: --cumulative and --percent.

nickloman · 2014-10-15T07:32:14Z

This is great John, thanks for the pull request. My only slight concern is that this adds new dependencies and we have found that users are struggling installing many dependencies on quite diverse setups. It might be good to re-code this to use Rpy2/ggplot2 as this is what we are using already for plotting. One of us could perhaps do this.

JohnUrban · 2014-10-15T18:00:18Z

I can try to re-code it, but have to familiarize myself with rpy2 and with ggplot2. I do a lot of coding in R (in fact, I first coded this in R which is why I used the pandas library in python), but usually just use regular old plot(). I first tried to do this in rpy2 for the reasons you mention, but found rpy2 somewhat confusing despite being familiar with both R and python -- any tips on it are welcomed. Despite the disadvantage of extra dependencies, one advantage of matplotlib plotting is that when the user does not use "--saveas" and it temporarily goes to screen, it allows the user to then choose to save what they see if they would like to (in any format).

…tration plots would look like with uniform sampling of read lengths

JohnUrban · 2014-10-15T20:54:07Z

I just added a feature to data_conc that allows the user to simulate what the data concentration plot would look like if the read lengths were uniformly sampled. By default it uses the same number of reads and range of sizes. The user can override this default and simulate any number of reads and range. MinION data concentration plots look strikingly uniform (though not completely) compared to pac bio plots.

… types, time constraints, etc. Simulation also updated to reflect these changes where relevant.

JohnUrban · 2014-10-16T16:58:47Z

DC plots can now be generated based on read type, start/end times, etc. I show examples on my ONT poreminion page. Still have not fully converted it to rpy2 though.

…Plot

JohnUrban added 4 commits October 14, 2014 13:51

Merge branch 'feature_N50'

148cefe

Merge remote-tracking branch 'upstream/master'

bfb7bc8

To data_conc added two more options. One is --percent which plots the…

ba1c016

… data as a percent of all data. The other is --cumulative which plots the cumulative data with increasing read length. --percent and --cumulative can be used together as well.

removed import line that was no longer in use

bb8e9b2

JohnUrban added 2 commits October 15, 2014 16:39

Added --simulation and --parameters to allow viewing what data concen…

cf7c742

…tration plots would look like with uniform sampling of read lengths

removing qualpos from this branch

21f7d52

Added more functionality to data_conc: can generate DC plots for read…

b9cde95

… types, time constraints, etc. Simulation also updated to reflect these changes where relevant.

JohnUrban added 2 commits October 16, 2014 13:03

fixed indentation error in dataconc.py

ab8d4aa

Merge remote-tracking branch 'upstream/master' into dataConcentration…

c7d5c3c

…Plot

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data concentration plot #36

Data concentration plot #36

JohnUrban commented Oct 14, 2014

JohnUrban commented Oct 15, 2014

nickloman commented Oct 15, 2014

JohnUrban commented Oct 15, 2014

JohnUrban commented Oct 15, 2014

JohnUrban commented Oct 16, 2014

Data concentration plot #36

Are you sure you want to change the base?

Data concentration plot #36

Conversation

JohnUrban commented Oct 14, 2014

JohnUrban commented Oct 15, 2014

nickloman commented Oct 15, 2014

JohnUrban commented Oct 15, 2014

JohnUrban commented Oct 15, 2014

JohnUrban commented Oct 16, 2014