Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data concentration plot #36

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

JohnUrban
Copy link
Contributor

Hi Aaron,

If you are interested, I added a new subtool that returns the type of plot one sees in MinKNOW during the sequencing run with the sum of data over each bin.

poretools/poretool_main.py was edited to include the subcommand "data_conc" and poretools/dataconc.py was added to the repertoire.

dataconc.py uses the matplotlib and pandas libraries. I kept with the structure and lingo you used in hist.py. data_conc can actually write to any image extension type that matplotlib allows, but I arbitrarily restricted it to pdf and jpg to avoid errors thrown from erroneous extensions.

best,

John

…t since it shows the read length neighborhood where most data is concentrated if such a neighborhood exists (they exist for pacbio reads, but the plot seems more uniform with minION reads). This is the type of plot one sees in MinKNOW during the sequencing run. poretools/poretool_main.py was edited to include this subcommand and poretools/dataconc.py was added to the repertoire. dataconc.py uses the matplotlib and pandas libraries and is a surprisingly simple few lines of code.
… data as a percent of all data. The other is --cumulative which plots the cumulative data with increasing read length. --percent and --cumulative can be used together as well.
@JohnUrban
Copy link
Contributor Author

I actually just updated data_conc to allow plotting the cumulative amount of data with increasing read length as well as to allow both the regular and cumulative plots as percents of total data instead of absolute amounts (in bp) of data. These options at the command line are: --cumulative and --percent.

@nickloman
Copy link
Collaborator

This is great John, thanks for the pull request. My only slight concern is that this adds new dependencies and we have found that users are struggling installing many dependencies on quite diverse setups. It might be good to re-code this to use Rpy2/ggplot2 as this is what we are using already for plotting. One of us could perhaps do this.

@JohnUrban
Copy link
Contributor Author

I can try to re-code it, but have to familiarize myself with rpy2 and with ggplot2. I do a lot of coding in R (in fact, I first coded this in R which is why I used the pandas library in python), but usually just use regular old plot(). I first tried to do this in rpy2 for the reasons you mention, but found rpy2 somewhat confusing despite being familiar with both R and python -- any tips on it are welcomed. Despite the disadvantage of extra dependencies, one advantage of matplotlib plotting is that when the user does not use "--saveas" and it temporarily goes to screen, it allows the user to then choose to save what they see if they would like to (in any format).

@JohnUrban
Copy link
Contributor Author

I just added a feature to data_conc that allows the user to simulate what the data concentration plot would look like if the read lengths were uniformly sampled. By default it uses the same number of reads and range of sizes. The user can override this default and simulate any number of reads and range. MinION data concentration plots look strikingly uniform (though not completely) compared to pac bio plots.

… types, time constraints, etc. Simulation also updated to reflect these changes where relevant.
@JohnUrban
Copy link
Contributor Author

DC plots can now be generated based on read type, start/end times, etc. I show examples on my ONT poreminion page. Still have not fully converted it to rpy2 though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants