Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0.0.1 #2

Merged
merged 5 commits into from
May 7, 2022
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
identify exons then unique genes with suboptimal coverage
  • Loading branch information
sophie22 committed May 7, 2022
commit 3070b64036b7d478c05bd9f24fec08d77c690bd0
8 changes: 6 additions & 2 deletions genes_coverage.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,15 @@

### Load sambamba output file contents into a DataFrame
sambamba_df = pd.read_csv(sambamba_file, sep='\t')
print(sambamba_df.columns)
print(sambamba_df.head())
# Split 'GeneSymbol;Accession' into separate columns
sambamba_df[["GeneSymbol", "Accession"]] = sambamba_df[
"GeneSymbol;Accession"].str.split(';', 1, expand=True)

### Identify exons with less than 100% coverage at 30x
below_threshold_exons_df = sambamba_df[sambamba_df[coverage_column] < 100.0]

### Identify unique genes with at least one exon with suboptimal coverage
below_threshold_genes = below_threshold_exons_df["GeneSymbol"].unique().tolist()
print(below_threshold_genes)

### Write gene symbols with suboptimal coverage to file