- 
                Notifications
    You must be signed in to change notification settings 
- Fork 6
Home
The gcc compilers usually come with built-in support for OpenMP. However, for Mac OS you may need to change some configurations to make OpenMP working.
While running the make command of this tool in your Mac, if you face fatal error: 'omp.h' file not found error, that means the version of clang in your PC doesn't support OpenMP. If you do not see any error, you are good to go. If you get the error, you may have to reinstall the gcc using
brew reinstall gcc
Then you have to link to the installed path by modifying your $PATH variable. Or you can forcefully create the symbolic links (if asked) while reinstalling gcc. To do so, run
brew link --overwrite gcc 
Now to use OpenMP you have to specify the proper gcc compiler (e.g., gcc, or gcc-7, or gcc-8), which supports OpenMP.
In the makefile inside the Tool directory, please change the CC value (name of the compiler) to the specific name of the compiler (e.g., gcc, or gcc-7, or gcc-8).
Reference: Installing OpenMP on Mac OS X 10.11
The gcc compilers in Linux come with built-in support for OpenMP. So, you do not need to do anything there. 
For Windows you may need to install a suitable version of gcc to use OpenMP. You can look at the following reference.
Getting started with openMP. install on windows
To use OpenMP in Visual Studio look here OpenMP in Visual C++
The Data directory contains the datasets used to evaluate our tool. All the data files consist of three sequences. The first two are the sequences to compare between, and the third sequence denotes the unique characters of a sequence (e.g., ATCG in this case). The first three lines of every data file consist of the lengths of these three sequences followed by the sequence strings themselves in the next three lines.
Two different datasets were used for this purpose. 
- The first one is a simulated data (inside the simulated directory) downloaded from Here (the GC content was changed for each sequence downloaded) . The dataset descriptions are given below.
| File names / Lengths | Length of 1st Sequence (bp) | Length of 2nd Sequence (bp) | 
|---|---|---|
| 1.txt | 128 | 127 | 
| 2.txt | 256 | 255 | 
| 3.txt | 512 | 511 | 
| 4.txt | 1024 | 1023 | 
| 5.txt | 2048 | 2047 | 
| 6.txt | 4096 | 4095 | 
| 7.txt | 8192 | 8191 | 
| 8.txt | 16384 | 16383 | 
| 9.txt | 32768 | 32767 | 
| 10.txt | 65536 | 65535 | 
| 11.txt | 131072 | 131071 | 
- The second dataset (inside real_data directory) consists of real DNA sequences of viruses and eukaryotes (source: NCBI). The first 8 files consist of virus genomes whereas, the last 2 files consist of genomes of entire chromosomes of two eukaryotes.
| File names / Sequences | Sequence 1 | Sequence 2 | 
|---|---|---|
| 1.txt | Potato spindle tuber viroid (360 bp) | Tomato apical stunt viroid (359 bp) | 
| 2.txt | Rottboellia yellow mottle virus (4194 bp) | Carrot mottle virus (4193 bp) | 
| 3.txt | Rehmannia mosaic virus (6395 bp) | Tobacco mosaic virus (6395 bp) | 
| 4.txt | Potato virus A (9588 bp) | Soybean mosaic virus N (9585 bp) | 
| 5.txt | Chicken megrivirus (9566 bp) | Chicken picornavirus 4 (9564 bp) | 
| 6.txt | Microbacterium phage VitulaEligans (17534 bp) | Rhizoctonia cerealis alphaendornavirus 1 (17486 bp) | 
| 7.txt | Lucheng Rn rat coronavirus (28763 bp) | Helicobacter phage Pt1918U (28760 bp) | 
| 8.txt | Lactococcus phage ASCC368 (32276 bp) | Uncultured Mediterranean phage uvMED (32133 bp) | 
| 9.txt | Athene Cunicularia (Chromosome- 25, 1505370 bp) | Bombus Terrestris (Chromosome- LG B18, 3078061 bp) | 
| 10.txt | Athene Cunicularia (Chromosome- 25, 1505370 bp) | Bombus Terrestris (Chromosome- LG B01, 16199981 bp) | 
At first, clone or download this repository. Then go to the Tool directory and run make command from Terminal/Command Prompt. This will create an executable file named find_lcs.
At first get the sequence files (e.g., FASTA) to compare from a convenient source (e.g., NCBI, Simulated Data from UCR etc.). Make sure the files consist of only nucleotide bases (A, T, C, or G). Therefore, you may need to remove the description lines (lines started with a >), and any other characters (newline character, whitespaces etc.) from the file.
After collecting the sequence files and removing unnecessary characters from it, create the input text file in the following manner.
- Put the sequence lengths (as integer value) of the two sequences in two lines.
- Then put the sequence length of unique characters (4) in the third line.
- After that, put the sequence strings in three consecutive lines. The first two lines are for the sequences to compare, and the third lines will be for the unique characters (in this case ATCG).
Following is an example of a small input file (dummy_input.txt) where the first sequence consists of 16 base pairs, and the second sequence consists of 15 base pairs.
16 15 4 ATATTTCCAAGGACCC ATTTCCCCCAAGGCA ATCG
Now to get the longest common subsequence (LCS) of two sequences, find the path to your desired data file. Then, run the following command 
./find_lcs dumm_input.txt > output.txt. 
This will use the dummy_input.txt file as your input and write the output in output.txt file.
N.B. This will use the maximum number of threads available in your PC.
After the completion of the execution, one can find the output of in the output.txt file. The output consists of three parts. The first part provides a summary of the input sequences. The second part provides the number of threads, LCS length, parcentage of match between the two sequences, and the total time taken (in seconds) for the  program. Following is the output file by using the ../data/simulated/11.txt file as input and using 4 threads for parallelization.
Your input file: ../data/simulated/11.txt Length of sequence 1: 131072 bp Length of sequence 2: 131071 bp ######## Results ######## Number of threads used: 4 Length of the LCS is: 127963 97.63% of the first sequence matches with second one Total time taken: 112.497263 seconds
N.B. With the increase of sequence lengths and number of threads, the timings of the parallel program improve significantly.
A video tutorial on how to use this tool can be found here: Video Tutorial
If you have any questions, please contact the author by email at shikderr@myumanitoba.ca