Skip to content
This repository has been archived by the owner on Jan 13, 2022. It is now read-only.

Scripts to extract mod info from hdf5 files #111

Open
callumparr opened this issue Mar 30, 2021 · 2 comments
Open

Scripts to extract mod info from hdf5 files #111

callumparr opened this issue Mar 30, 2021 · 2 comments

Comments

@callumparr
Copy link

callumparr commented Mar 30, 2021

Is there any scripts or example code to extract mod probabilities outputted to hdf5 by the basecaller.py script using h5py. I do not have any experience with python code and I remember seeing some snippet about this but no longer can find.

Something about importing with h5py and then extracting.

import h5py
import numpy as np

with h5py.File('/home/minion/Documents/multi_reads/taiyaki_output/5EU_RNA_only_fastq/5EU_RNA_only_modbase_basecalls.hdf5', 'r') as h5:
        cond_logprobs = h5['/Reads/ffff85d3-dc1e-42d5-8ce8-2357112c1d86'][:120]
        print(h5['mod_long_names'][()]

print(np.where(~np.isnan(cond_longprobs))[0])

print(np.where(cond_probs[:,1] > np.log(0.5)))

print(np.where(cond_probs[:,1] > np.log(0.9)))

print(np.exp(np.nanmax(cond_probs[:,0])))
@lpryszcz
Copy link
Contributor

lpryszcz commented Apr 7, 2021

Hi, you can either use:

  • fast5mod - it's deprecated
  • megalodon - it didn't work with our custom-trained RNA models
  • modPhred - should work with custom-trained DNA/RNA models

Extracting information directly from Fast5 is a bit complicated, but you can do it with some experience in coding. For starting example code check here.

@marcus1487
Copy link
Contributor

There is no dedicated example of this extraction. This output is intended for research demonstration purposes and not necessarily robust downstream anaylsis.

Megalodon should work for custom-trained models. Dump to json and pass in via guppy config file. @lpryszcz have you opened an issue in megalodon with the specific failure mode? It would be great to get a fix into megalodon if that is indeed not working.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants