I am sharing my code repository to understand the basic working and principle of various BioInformatics code in Python; I am writting this github repository with 2 main intensions of mine first and foremost is to build myself a code base to look back onto how I learnt the Bioinformatics and understood how the skeleton of code works/interacts with my subject of interest i.e. Biology and second is to make you people understand how the BioPython a popluar Biological library in Python which almost feel like a cheat sheet to solve complex algorithms in Biologial genetic data.
Hence, Let's dive into the journey with me where I start from basic to the complex concepts which is actually necessary to solve the real world calculations in Bioinformatics.
A very basic intial Strucutral concept explained which primarirly involves this concepts:
- Calcuate pattern
- Frequecney map of DNA sequence
- Most frequent K-mers
- Complementary DNA string
- Pattern Matching
This all seems very intimaidating at first but trust me all of 'em have been explained in a very intresting manner, I urge you to take a look at it and you'll surely thank me later considering that.
A bit more complex and tough concepts are being touched upon which needed a lot more efforts to be understood are taken into account here, and I am giving my level best to make it most simpler interpreation of that concept to understand and trying to address each and every question which I had and possibly you could have durning your learning journey.
The primary topics involved here looks like:
- Symbol Array in DNA Seq
- Symbol Array but faster (Extended
SymbolArray
) - Skew Array in DNA seq (2 methods explained)
- Minimum Skew in DNA seq
- Calcualting Hamming Distance between DNA
- Approximate Pattern Matching in DNA
Now season 3 and the upcoming last season involves a bit of Umbrealla type of Code where I will walk you through each and every type of functions seperately and then finally after using that small function codes we will make something very meaningful out of it.
Therefore, I encourage you to jus stick with me and just enjoy the miracle or magic that you are about to experince at the end durning this journey.
The primary topics covered here are:
- Basic Numpy Array understanding
- Counting Nucleotide Frequencies in DNA
- Profile Matrix from DNA
- Consensus Sequence from DNA
- Scoring DNA Motifs
- Profile Most Probable K-mer
- Greedy Motif Search (the finale of all - the main standalone function which uses all the above function in the form of Greedy Algorithm)
- Finding Patterns with Mismatches (extras)
Now the Series finale is this particular season which leads to end of this particualar git repo and in turn ends your intial journey of understanding very major Bioinformatics algorithms as well.
Here primarirly we will cover 2 important topics
- GreedyMotifSearch with Pseudocounts
- Randomized Motif Search
But wait, there's more! Keep an eye out for other exciting repositories on Bioinformatics, Computational Drug Design, and Next-Gen Sequencing over on my GitHub.
That's all from my side for now. Until next time, this is your man Mohit @mhtjsh signing off. Cheers! ✌️