CYP8B1 is a single exonic gene that determines the ratio of primary bile salts. The code and data provided in this project are part of the below manuscript. The scripts and data are organised to ensure the integrity, credibility and replicability of the results reported. However, the goal of this repository is not to release a fully-automate pipeline and is beyond the scope of this manuscript.
Sagar Sharad Shinde1, Lokdeep Teekas1, Sandhya Sharma1, Nagarjun Vijay1
1Computational Evolutionary Genomics Lab, Department of Biological Sciences, IISER Bhopal, Bhauri, Madhya Pradesh, India
*Correspondence: nagarjun@iiserb.ac.in
Data is organised into the following folders:
- ORFs: Each file in this folder contains the complete open reading from of the CYP8B1 gene starting from start codon all the way till the stop codon
- SAMs: Each file in this folder contains the results of performing SRA blastn search against publically available raw read data from the short read archive (SRA)
- MSAs: Each file in this folder contains the results of multiple sequence alignment of the ORF files using guidance with PRANK, CLUSTALW, MAFFT or MUSCLE as the aligner
- gc_content: The GC content and GC deviation are calculated for each ORF in window size of 100 with a step size of 10. The script plotGC_content.r is used to visualise these results
- scripts: The scripts used for performing the ORF validation, multiple sequence alignment, model testing, tree topology inference and tests for relaxed selection are provided. Contents of this folder (scripts and instructions) along with published software tools should be suffecient to replicate all the results described in the manuscript.
- relaxation_tests: Output files obtained after running the RELAX program implemented in the HYPHY package.
Prerequisites:
- PRANK (v.140603)
- MUSCLE (v3.8.31)
- MAFFT (v7.407)
- CLUSTALW (2.0.12)
- MEGA (10.0.5)
- DAMBE (7.0.58)
- bam-readcount (0.8.0)
- MUMSA (1.0) (Lassmann and Sonnhammer 2005)
- modeltest-ng (Darriba et al. 2019)
- raxml-ng
- HyPhy (2.3.14)