GitHub - PalMuc/congeneric_synteny: analysis of synteny across animals

This is the data repository for the following publication

Genomic changes are varied across congeneric species pairs

Francis, Warren R.¹, Vargas, Sergio¹, Wörheide, Gert ^1,2,3,*

¹ Department of Earth and Environmental Sciences, Paleontology and Geobiology, Ludwig-Maximilians-Universität München, Munich, Germany.
² GeoBio-Center, Ludwig-Maximilians-Universität München, Munich, Germany
³ Staatliche Naturwissenschaftliche Sammlungen Bayerns (SNSB)–Bayerische Staatssammlung für Paläontologie und Geologie, Munich, Germany

*corresponding author

ABSTRACT

Synteny, the shared arrangement of genes on chromosomes between related species, is a marker of shared ancestry, and synteny-breaking events can result in genomic incompatibilities between populations and ultimately lead to speciation events. Despite its pivotal role as a driver of speciation, the role of synteny breaks on speciation is poorly studied due to a lack of chromosome-level genome assemblies for a taxonomically broad sample of organisms. Here, using 22 con-generic animal genome pairs, we find a link between protein identity, microsynteny, and macrosynteny, but no evidence for a universal path of genomic change during speciation. We observed varied trajectories of synteny conservation relative to protein identity in non-model organisms, with many species’ pairs showing no karyotypic changes and others displaying large genomic rearrangements. This contrasts with previous studies on model organisms and indicates that the genomic changes preceding or resulting from speciation are likely very contextual between clades.

Analytical approach

For each pair of genomes (congeneric species), microsynteny and macrosynteny are both analysed.

The pipeline processor run_synteny_analysis.py is coded in Python, and run simply as:

run_synteny_analysis.py -i species_pair_list.tab

For each species pair, for example the tuna, this begins with the scaffolds, proteins, and GFF downloaded from NCBI:

GCF_910596095.1_fThuMac1.1_genomic.fna.gz
GCF_910596095.1_fThuMac1.1_genomic.gff.gz
GCF_910596095.1_fThuMac1.1_protein.faa.gz
GCF_914725855.1_fThuAlb1.1_genomic.fna.gz
GCF_914725855.1_fThuAlb1.1_genomic.gff.gz
GCF_914725855.1_fThuAlb1.1_protein.faa.gz

and this generates the following files for each species:

get_genbank_longest_isoforms.py filtered proteins with isoforms removed .x.faa, like: GCF_910596095.1_fThuMac1.1_protein.x.faa and GCF_914725855.1_fThuAlb1.1_protein.x.faa
get_genbank_longest_isoforms.py filtered GFFs corresponding to the proteins .x.gff, like: GCF_910596095.1_fThuMac1.1_genomic.x.gff , GCF_914725855.1_fThuAlb1.1_genomic.x.gff
DIAMOND results fThuAlb1_vs_fThuMac1.blastp.tab and fThuAlb1_vs_fThuMac1.renamed.blastp.tab
scaffold_synteny.py results fThuAlb1_vs_fThuMac1.scaffold_synteny.tab and fThuAlb1_vs_fThuMac1.scaffold_synteny.pdf
microsynteny.py results fThuAlb1_vs_fThuMac1.microsynteny.tab and fThuAlb1_vs_fThuMac1.microsynteny.pdf
fastarenamer.py renamed versions of proteins for clustering .x.n.faa, like: GCF_910596095.1_fThuMac1.1_protein.x.n.faa , GCF_914725855.1_fThuAlb1.1_protein.x.n.faa
makehomologs.py clustering outputs fasta_clusters.H.thunnus_clusters_v1.tab clusters_thunnus_clusters_v1.tar.gz and log thunnus_clusters_v1.2023-08-02-010624.mh.log
alignment_conserved_site_to_dots.py accumulated tabular output fThuAlb1_vs_fThuMac1.homologs_identity.tab

Subsequent processing occurs using several R scripts, for analysis and plotting.

Full citation

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
01-input_datasets		01-input_datasets
02-processing_scripts		02-processing_scripts
03-graphic_scripts		03-graphic_scripts
04-macrosynteny_plots		04-macrosynteny_plots
05-microsynteny_plots		05-microsynteny_plots
06-prot_id_tables		06-prot_id_tables
07-cluster_stats		07-cluster_stats
08-off_main_tables		08-off_main_tables
figures_for_paper		figures_for_paper
links		links
logfiles		logfiles
misc_datafiles		misc_datafiles
summary_data		summary_data
supplements_for_paper		supplements_for_paper
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

This is the data repository for the following publication

Genomic changes are varied across congeneric species pairs

Francis, Warren R.¹, Vargas, Sergio¹, Wörheide, Gert ^1,2,3,*

ABSTRACT

Analytical approach

Full citation

About

Releases 1

Packages

Contributors 2

Languages

License

PalMuc/congeneric_synteny

Folders and files

Latest commit

History

Repository files navigation

This is the data repository for the following publication

Genomic changes are varied across congeneric species pairs

Francis, Warren R.1, Vargas, Sergio1, Wörheide, Gert 1,2,3,*

ABSTRACT

Analytical approach

Full citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Francis, Warren R.¹, Vargas, Sergio¹, Wörheide, Gert ^1,2,3,*

Packages