Skip to content

AswinSSoman/CYP8B1

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CYP8B1

CYP8B1 is a single exonic gene that determines the ratio of primary bile salts. The code and data provided in this project are part of the below manuscript. The scripts and data are organised to ensure the integrity, credibility and replicability of the results reported. However, the goal of this repository is not to release a fully-automate pipeline and is beyond the scope of this manuscript. You can access/cite the publication with the following information "Shinde, S.S., Teekas, L., Sharma, S. et al. J Mol Evol (2019). https://doi.org/10.1007/s00239-019-09903-6". A pre-publication pre-print of the same is available here: https://www.biorxiv.org/content/10.1101/714188v1

Signatures of relaxed selection in the CYP8B1 gene of birds and mammals

Sagar Sharad Shinde1, Lokdeep Teekas1, Sandhya Sharma1, Nagarjun Vijay1

1Computational Evolutionary Genomics Lab, Department of Biological Sciences, IISER Bhopal, Bhauri, Madhya Pradesh, India

*Correspondence: nagarjun@iiserb.ac.in

Data is organised into the following folders:

  1. ORFs: Each file in this folder contains the complete open reading from of the CYP8B1 gene starting from start codon all the way till the stop codon
  2. SAMs: Each file in this folder contains the results of performing SRA blastn search against publically available raw read data from the short read archive (SRA)
  3. MSAs: Each file in this folder contains the results of multiple sequence alignment of the ORF files using guidance with PRANK, CLUSTALW, MAFFT or MUSCLE as the aligner
  4. gc_content: The GC content and GC deviation are calculated for each ORF in window size of 100 with a step size of 10. The script plotGC_content.r is used to visualise these results
  5. scripts: The scripts used for performing the ORF validation, multiple sequence alignment, model testing, tree topology inference and tests for relaxed selection are provided. Contents of this folder (scripts and instructions) along with published software tools should be suffecient to replicate all the results described in the manuscript.
  6. relaxation_tests: Output files obtained after running the RELAX program implemented in the HYPHY package.

Prerequisites:

  1. PRANK (v.140603)
  2. MUSCLE (v3.8.31)
  3. MAFFT (v7.407)
  4. CLUSTALW (2.0.12)
  5. MEGA (10.0.5)
  6. DAMBE (7.0.58)
  7. bam-readcount (0.8.0)
  8. MUMSA (1.0)
  9. (Lassmann and Sonnhammer 2005)
  10. modeltest-ng
  11. (Darriba et al. 2019)
  12. raxml-ng
  13. HyPhy (2.3.14)
  14. Results

    Extra figures: Signatures of relaxed selection represented as a combination of K values (K<1 is relaxed selection & K>1 intensified selection) denoting the intensity of change in selection and corresponding p-values in species from the infraclass Marsupialia (Ornithorhynchus anatinus or duck-billed platypus, Vombatus ursinus or the common wombat and Phascolarctos_cinereus or the koala bear) along with outgroup species consisting of Dasypus_novemcinctus (Nine-banded armadillo) from order Cingulata, Erinaceus europaeus (European hedgehog) and Sorex_araneus (Eurasian shrew) from order Eulipotyphla. Each data point in the figure is the result of one hypothesis test each. Tests that showed significant relaxation of selection are shown in filled squares while those that are not significant are shown as empty circles. The colour of each data point corresponds to one of the three tree topologies shown in panel A. (A) Tree topologies used to assess signatures of relaxed selection (B & C) Relax test results for all the six species included in the alignment. Based on the set of species considered here, the wombat consistenly shows signatures of relaxed selection. The platypus and koala bear show signatures of intensification. However, the signatures are highly variable for the other three species. Inclusion of more marsupial species will help further refine the patterns identified.

Releases

No releases published

Packages

No packages published

Languages

  • R 60.7%
  • Shell 31.7%
  • Perl 7.6%