The Main Function decides from the return of the validator in Funtion0, whether the input is provided in the right format and then executes depending on the results the following functions, prints their results to the screen in the terminal as well as saves them into a simple plain textfile in the folder and directory of the terminal program. It ends the program with leaving you the output and calculation in form of the "dna-sequencer.txt", which will be generated by execution.
Function0 works with regular expressions as validator for your input in the main function. It checks if your provided gene and genome, further your DNA string and your DNA sequence you typed into the terminal consists only out of the typical bases for DNA, which are Adenin (A), Cytosin (C), Guanin (G) und Thymin (T). The input is stripped from whitespace beforehand, but if only blank input is provided the validator of Function0 jumps in. In any case a boolean statement is returned.
Function1 works as simple as it looks with only one line of code based on the built-in length function in Python. It measures the length of your provided string and input of your DNA sequence and returns it back to the main function. Depending on the maximum length of the input function (see disclaimer below for further information), the maximum length can only be 1023 and is limited by 2 to the power of 10 bits minus one bit. The minimum length must be at least 1 and can not be 0.
Function2 works as GC content calculator and counts the bases Guanin (G) and Cytosin (C) within the provided string. It returns a percentage for your provided input and specific DNA sequence. The GC content has informative value for biologists as it's telling a lot about how stable a gene or genome is and with this provides ground for further decisions for example in gene editing. This is the formula for the GC content calculation used in here: Count(G + C) / Count(A + T + G + C) * 100%
Function3 works with iteration over your provided DNA sequence and returns the reverse complement of the DNA sequence. For every base in DNA, there is a complementary base, for Adenin (A) it's Thymin (T), for Cytosin (C) it's Guanin (G) and in each case the other way around. With the help of a small Python dictionary the reverse iteration is supported in this case and every single base in your provided DNA sequence is exchanged by its complementary base and as a whole returned.
Function4 works also with iteration over your provided DNA sequence and returns a translation of your bases into a protein sequence with the corresponding amino acids from the Codon table. For combinations of three of the bases there can be found an amino acid in the Codon table - the source to the Codon table I used is linked in a comment in the code. The standard file format FASTA works with protein sequences like those returned, as it's also a useful compression of base sequences.
Function5 works with regular expressions and iterates over your DNA sequence to find the DNA string you are looking for. It works very much as an indexer for the motive you are looking for in your sequence and returns a list of all indices where your motive can be found. The indices start from "0", which would be in this case the very first position in your sequence. It returns "None" if your DNA string was not found in the DNA sequence you provided to the terminal program.
This terminal program was created for my own learning / education purposes only! Why? Because the inputs of the program are so far limited to 1023 letters and with this to 1023 bases - but genes, genomes and in general DNA strings can be much longer, what makes this terminal program somewhat useless for scientific research purposes - even if the functions and formulas might work correctly in cases of shortened and limited inputs. This tool in here is also not FASTA compatible, yet. If you need a serious application for bioinformatics, there are many providers available online - just look it up. I hope you still enjoy my application!