Lineage-Specific-Marker

filter the reads

run filter_by_qscore.sh to filter the reads by qscore and then run filter_by_length.ipynb to filter the sequences by length.

cluster and build the consensus

run run_vsearch.sh, this script will run vsearch clustering algorithm on each fasta file and output three files, .uc file stores the cluster information for each sequence, two .fasta files store the centroid and consensus sequences for each cluster

summarize the cluster information

run write_cluster_to_excel.ipynb, this will summarize of cluster information and generate the .xlsx files. There are three columns for each .xlsx file, which stored the centroid sequence and consensus sequence of each cluster

extract the desired consensus

After summarize the cluster information, it would be able to see the large clusters that contain high percentage of sequencces. Build a txt file which stores the id of selected clusters by ascending order and the different groups of ids should also be stored by the ascending order of files. Then run write_consensus_by_cluster.ipynb to get the consensus for each sample.

geneious analysis

After get the consensus sequences, import them to the geneious software and then reorient the consensus sequences, align them and build the phylogenetic trees

Dependency

for python script

python 3.11.5
Biopython 1.78
pandas 2.1.1

for bash file

NanoFilt 2.8.0
vsearch 2.22.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lineage-Specific-Marker

filter the reads

cluster and build the consensus

summarize the cluster information

extract the desired consensus

geneious analysis

Dependency

for python script

for bash file

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
alignment_images_and_consensus		alignment_images_and_consensus
README.md		README.md
filter_by_length.ipynb		filter_by_length.ipynb
filter_by_qscore.sh		filter_by_qscore.sh
run_vsearch.sh		run_vsearch.sh
write_cluster_to_excel.ipynb		write_cluster_to_excel.ipynb
write_consensus_by_cluster.ipynb		write_consensus_by_cluster.ipynb

TheRainInSpain/Lineage-Specific-Marker

Folders and files

Latest commit

History

Repository files navigation

Lineage-Specific-Marker

filter the reads

cluster and build the consensus

summarize the cluster information

extract the desired consensus

geneious analysis

Dependency

for python script

for bash file

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages