-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
getting error #9
Comments
Hello! It says that the files "have incorrect sequence identifier string". VirGenA assumes the reads' names in pairs in two files to concide: it actually assumes that some reads may have no pairs, so it matches reads pairs by names, not by the order in the files. The other assemblers may ignore name issues. Could you share a couple of top strings from all_npv_samtools_R1_paired.fastq.gz and all_npv_samtools_R2_paired.fastq.gz? Sincerely yours, |
Sir,
I have attached the first 10 reads from both files. I extracted reads
following the command
seqkit range -r 1:10 input.fa
…On Sat, Jul 9, 2022 at 1:26 AM Gennady Fedonin ***@***.***> wrote:
Hello!
It says that the files "have incorrect sequence identifier string".
VirGenA assumes the reads' names in pairs in two files to concide: it
actually assumes that some reads may have no pairs, so it matches reads
pairs by names, not by the order in the files. The other assemblers may
ignore name issues. Could you share a couple of top strings from
all_npv_samtools_R1_paired.fastq.gz and all_npv_samtools_R2_paired.fastq.gz?
Sincerely yours,
Gennady.
—
Reply to this email directly, view it on GitHub
<#9 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AN2V7HTHZO6C3LSLKRNV4V3VTCBVZANCNFSM53AF5E2Q>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
--
*Shanmugavadivel, P. S.*
*Scientist (Agricultural Biotechnology),*
*#216, Block A,*
*ICAR-Indian Institute of Pulses Research,*
*Min. of Agriculture & Farmers Welfare,*
*Govt. of India,Kanpur - 208 024.*
*email: ***@***.*** ***@***.***>*
*https://iipr.icar.gov.in/ <https://iipr.icar.gov.in/>*
|
I can't see the attachment. |
Sorry for the inconvenience caused. I regret that.
Sir, I am giving read details here instead of attachment.
*Forward Reads:*
@NB501457:500:HNWVWAFX2:3:21512:23611:17846
ATCCAAGTTGCGTTCTTCATGTTCGTGCTCTTCTAAATGCTTCTTGCGTTTGGCATTTTTAATAACAGCACCTAAATTTTTAAAGTATTTGTTGTCGTACACATAGGTTTTGCCCAAAGTAGGGCTGTAACTGTAACGAGTATACATTATG
+
AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEE/EEAEEEEEEEE6EEEEEE/EEAAAE/E6<<EEAEEEE<EEEEEAEEAEEEEAEEEEEEEEE<AAEE<EE<AEEEEEEEAAAEEEEEEEAEEE<AEEEEEEAEEE<E<
@NB501457:500:HNWVWAFX2:1:11301:6783:19262
GAGCGAATCCAAGTTGCGTTCTTCATGTTCGTGCTCTTCTAAATGCTTCTTGCGTTTGGCATTTTTAATAACAGCACCTAAATTTTTAAAGTATTTGTTGTCGTACACATAGGTTTTGCCCAAAGTAGGGCTGTAACTGTAACGAGTATAC
+
AAAAAEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEAEEE<AEEEEEAEEEEEEEEEEEEEEEEEEEEEAE<EEE<EEEEEEEAEEAE<EEE<EE<EA6AAAEAEEAAEEEEEAEEEEEEEEEAE<EEEEAEEEEEEEAEEEEEEE/
@NB501457:500:HNWVWAFX2:3:11401:8456:7607
ATACTCGTTACAGTTACAGCCCTACTTTGGGCAAAACCTATGTGTACGACAACAAATACTTTAAAAATTTAGGTGCTGTTATTAAAAATGCCAAACGCAAGAAGCATTTAGAAGAGCACGAACATGAAGAACGCAACTTGGATTCGCTCG
+
AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEAEA/EEEEAEEEEEE<EEEAEEEEEAEEAEEEEEEE
@NB501457:500:HNWVWAFX2:1:11312:20869:3276
CGTTACAGTTACAGCCCTACTTTGGGCAAAACCTATGTGTACGACAACAAATACTTTAAAAATTTAGGTGCTGTTATTAAAAATGCCAAACGCAAGAAGCATTTA
+
AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE<EEEEEEAAAAAEEAEEEEEEEAEEEA
@NB501457:500:HNWVWAFX2:2:21306:2726:4485
TTTGTCGAGCGAATCCAAGTTGCGTTCTTCATGTTCGTGCTCTTCTAAATGCTTCTTGCGTTTGGCATTTTTAATAACAGCACCTAAATTTTTAAAGTATTTGTTGTCGTACACATAGGTTTTGCCCAAAGTAGGGCTGTAACTGTAACG
+
AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEAEEEEAEAEAAEEAEEEAAAEEEEEEEEEEEAEEA/EEEE<EEEAEEEEEEE/AEEEEEEEEEEEEEEEEAEEAEAAAEEEEAA<AEEEEE6<A6AAAEAEEEAEEAA<
@NB501457:500:HNWVWAFX2:4:21608:18950:15018
TTGTCGAGCGAATCCAAGTTGCGTTCTTCATGTTCGTGCTCTTCTAAATGCTTCTTGCGTTTGGCATTTTTAATAACAGCACCTAAATTTTTAAAGTATTTGTTGTCGTACACATAGGTTTTGCCCAAAGTAGGGCTGTAACTGTAACG
+
AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE6EE<EEEEEEEEAEEEEEEEEEEEEEEAEEEEEEEEEEEEEEAAAAEEEEEEE<EEEEEEEEEAEEEEEAEEE/EEEEEA/EEEEE<6/EEEEEEEAAAAAEEEAEAAEEAE
@NB501457:500:HNWVWAFX2:4:11509:13453:6115
GTATTTGTCGAGCGAATCCAAGTTGCGTTCTTCATGTTCGTGCTCTTCTAAATGCTTCTTGCGTTTGGCATTTTTAATAACAGCACCTAAATTTTTAAAGTATTTGTTGTCGTACACATAGGTTTTGCCCAAAGTAGGGCTGTAACTGTA
+
AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEE<<EEEEEEEE<EE/AEEEAEEEAEEAEA<EEAAEEEAA<EEEEEA6AE<EAAE<<<AEEEEEEEEEE
@NB501457:500:HNWVWAFX2:4:11511:10832:5783
CAAGTATTTGTCGAGCGAATCCAAGTTGCGTTCTTCATGTTCGTGCTCTTCTAAATGCTTCTTGCGTTTGGCATTTTTAATAACAGCACCTAAATTTTTAAAGTATTTGTTGTCGTACACATAGGTTTTGCCCAAAGTAGGGCTGTAACTG
+
AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEA/EEEEEEAEAEEEEEEEEEEEEEEEEEEEEE/EEEEAA<AAEE<A<EEEEEEAA<AEEEEAAEE
@NB501457:500:HNWVWAFX2:3:21402:23427:18442
CACCAAGTATTTGTCGAGCGAATCCAAGTTGCGTTCTTCATGTTCGTGCTCTTCTAAATGCTTCTTGCGTTTGGCATTTTTAATAACAGCACCTAAATTTTTAAAGTATTTGTTGTCGTACACATAGGTTTTGCCCAAAGTAGGGCTGTA
+
AAAAAEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAEEEEEEEEEAEEEEEEEAEEEEEEAEEEEEEEEEEEEEEEEEEAEEEEEEAEAEEAEEEEEEEE<A<AAAAEAEEEAAEEEEEAAEEEEEEEE<AEEEEA
@NB501457:500:HNWVWAFX2:3:11510:20596:3882
GCCACCAAGTATTTGTCGAGCGAATCCAAGTTGCGTTCTTCATGTTCGTGCTCTTCTAAATGCTTCTTGCGTTTGGCATTTTTAATATCAGCACCTAAATTTTTAAAGTATTTGTTGTCGTACACATAGGTTTTGCCCAAAGTAGGGCTGT
+
AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEE/EEEEEEEEEEE/EEEEEEEEEEEEAEE/EEEEEEEEEEAEEAA<<AEEEEEEEEEEEA
*Reverse Reads:*
@NB501457:500:HNWVWAFX2:2:21110:19846:17998
TAATGTATACTCGTTACAGTTACAGCCCTACTTTGGGCAAAACCTATGTGTACGACAACAAATACTTTAAAAATTTAGGTGCTGTTATTAAAAATGCCAAACGCAAGAAGCATTTAGAAGAGCACGAACATGAAGAACGCAACTTGGATTC
+
AAAAAEEEEEEEAEEEEEEEEEEEEEEEEEE/EEEEEEEE//EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEE/EEEEAEEEE6EEEEEEEEEAEAAEEEEAAEEAAAEAEE/EEEEEE/EEEA/EAA/E///EEAA6/EE/EEEEA
@NB501457:500:HNWVWAFX2:4:21608:18950:15018
AATGTATACTCGTTACAGTTACAGCCCTACTTTGGGCAAAACCTATGTGTACGACAACAAATACTTTAAAAATTTAGGTGCTGTTATTAAAAATGCCAAACGCAAGAAGCATTTAGAAGAGCACGAACATGAAGAACGCAACTTGGATTCG
+
AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEAEEEEE<AEAEEEAEAEEEEEAEAAEEEEEE<AEEEEEEEEEEE/
@NB501457:500:HNWVWAFX2:1:11206:24348:12841
CCAAGTTGCGTTCTTCATGTTCGTGCTCTTCTAAATGCTTCTTGCGTTTGGCATTTTTAATAACAGCACCTAAATTTTTAAAGTATTTGTTGTCGTACACATAGGTTTTGCCCAAAGTAGGGCTGTAACTGTAACGAGTATACATTATGGG
+
AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE/EEEEEEEAEEEEEEEEE<AE<EEEEEEEEAEEEEEEEEEEEEEAEEEE<AE<<<<AEAEEEAAAEEEEEEEAEEEAAAEEEEEE/EA<AAA<
@NB501457:500:HNWVWAFX2:1:11312:20869:3276
TAAATGCTTCTTGCGTTTGGCATTTTTAATAACAGCACCTAAATTTTTAAAGTATTTGTTGTCGTACACATAGGTTTTGCCCAAAGTAGGGCTGTAACTGTAACG
+
AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEAA<EEEEEEEAEEAAEEEEEEEEEEA
@NB501457:500:HNWVWAFX2:1:11111:23221:16579
TTACAGTTACAGCCCTACTTTGGGCAAAACCTATGTGTACGACAACAAATACTTTAAAAATTTAGGTGCTGTTATTAAAAATGCCAAACGCAAGAAGCATTTAGAAGAGCACGAACATGAAGAACGCAACTTGGATTCGCTCGACAAATAC
+
AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEAEEEEAEEEEEEAEEEEEEEEEEEEEEEEEEAEEE<AAEAEEEE
@NB501457:500:HNWVWAFX2:3:21608:16394:11700
CAAGTATTTGTCGAGCGAATCCAAGTTGCGTTCTTCATGTTCGTGCTCTTCTAAATGCTTCTTGCGTTTGGCATTTTTAATAACAGCACCTAAATTTTTAAAGTATTTGTTGTCGTACACATAGGTTTTGCCCAAAGTAGGGCTGTAACTG
+
AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEE<EEEEEEEEEEEEEEAEEEEEEEEEA<AE/AAAAAEEAEEEEEA<6EE/AAEEAE6EEEE/AE/
@NB501457:500:HNWVWAFX2:4:11608:10885:7701
ACCAAGTATTTGTCGAGCGAATCCAAGTTGCGTTCTTCATGTTCGTGCTCTTCTAAATGCTTCTTGCGTTTGGCATTTTTAATAACAGCACCTAAATTTTTAAAGTATTTGTTGTCGTACACATAGGTTTTGCCCAAAGTAGGGCTGTAAC
+
AAAAAEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEE/EEEEEEEEAEEE<AAEEEEEEEEEEEEEEEEEAEEEEEEEE<A/EEEEEEEEEEEE<EA<EAEAEEEEEEE<AA6EEEEEEEEEEEEEE/A
@NB501457:500:HNWVWAFX2:1:11302:17990:6176
CCGCCACCAAGTATTTGTCGAGCGAATCCAAGTTGCGTTCTTCATGTTCGTGCTCTTCTAAATGCTTCTTGCGTTTGGCATTTTTAATAACAGCACCTAAATTTTTAAAGTATTTGTTGTCGTACACATAGGTTTTGCCCAAAGTAGGGCG
+
AAAAAAEEEEEEEEEEEEEEEEEE/A/EEEEEEEEEEAAE/EE<EEEEEEEEEEEEEEEEE/EA<EEEEEEEEA/EEEEEAEEEE/EA//E/EE//EEEE/EEAEEEEAAEAEEEEEE<AAEAE66AEEAEEEAEEAA/<<EEEAEEE<E/
@NB501457:500:HNWVWAFX2:3:11404:6926:13327
CCTACTTTGGGCAAAACCTATGTGTACGACAACAAATACTTTAAAAATTTAGGTGCTGTTATTAAAAATGCCAAACGCAAGAAGCATTTAGAAGAGCACGAACATGAAGAACGCAACTTGGATTCGCTCGACAAATACTTGGTGGCGGA
+
AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEAEEEEEEAEEEEEEEEEEEEAEAEEEEAAEEEEEE/AAEEEEEEEEEEEEAAAAAEEEEEEEEAEEEAAEEE<AE
@NB501457:500:HNWVWAFX2:4:21407:10028:18536
CTTCCGCCACCAAGTATTTGTCGAGCGAATCCAAGTTGCGTTCTTCATGTTCGTGCTCTTCTAAATGCTTCTTGCGTTTGGCATTTTTAATAACAGCACCTAAATTTTTAAAGTATTTGTTGTCGTACACATAGGTTTTGCCCAAAGTAGG
+
AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEAEEEEEEEEEEEEAAEAEEEEEEEE/AEEEEEEEEEEEEEEE<E<EEAEEEEEEEEE6EEEEEEEEAAEE<EEAA<EEEEEEE<
…On Sat, Jul 9, 2022 at 10:51 AM Gennady Fedonin ***@***.***> wrote:
I can't see the attachment.
—
Reply to this email directly, view it on GitHub
<#9 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AN2V7HRGGLHAFH3QO4EG5NDVTED6LANCNFSM53AF5E2Q>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
--
*Shanmugavadivel, P. S.*
*Scientist (Agricultural Biotechnology),*
*#216, Block A,*
*ICAR-Indian Institute of Pulses Research,*
*Min. of Agriculture & Farmers Welfare,*
*Govt. of India,Kanpur - 208 024.*
*email: ***@***.*** ***@***.***>*
*https://iipr.icar.gov.in/ <https://iipr.icar.gov.in/>*
|
The names are expected to either end with /1 or /2 or to contain the ' ' character. The id of the pair is considered to be a substring starting from the first character after @ till the '/' or ' '. Here are the examples of Illumina names from Wikipedia: "@HWUSI-EAS100R:6:73:941:1973#0/1" "@EAS139:136:FC706VJ:2:2104:15343:197393 1:Y:18:ATCACG" "@SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36" It looks like your reads were preprocessed and the names were trimmed or even the read order was changed. In case you sure the order is correct you can try to rename all the reads like "SampleID_1 1" ... "SampleID_10000 1" in the first file and "SampleID_1 2" ... "SampleID_10000 2" in the second. But are you really sure these reads you printed do form the correct pairs? |
Thanks Gfedoninfor the reply.
I looked up the seq. file I think, Seqkit gave jumbled reads from both
files and hence those are not in order.
I saw the reads in main file and those reads are as follows and is as per
the order.
*Forward reads*
@NB501457:500:HNWVWAFX2:1:11101:1054:14716/1
TTGATACGGTACAAATTGAAATATTTCTCTTCAGATTTGTACATACAGTCAAACAAATCGTTTTCATTTTCAATAACATGGTTCTTGAAGGCTCCGCTTAAACGTTTGAAGTCGCTCATCATTTCTACTTGAACACCGTTGATTGTACCAT
+
AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE/E/EEE<EEEEEAEE<AEE
@NB501457:500:HNWVWAFX2:1:11101:1065:12395/1
GTATAGACTCGAGAAAAACACATAGTCTGCATGGGTTCATGACTGAACATTGCCAAAATATTCATAAAACAACTGTTTTCTTTGTACAAATTACTCAGTCCTATCAACTGATACATGTGACGAACAAAACTAGAGTCGTAATATACGCTGT
+
AAAAAEEEEEEEEEEE6EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEAEE/EEEEEEEEEEEEEEEEEAAEEEEEEEEEEEEEEEE/EEEEEEEE<EEEEAEEEEEEA/EEEEEEEEEAAEEEE/EAAEE<
@NB501457:500:HNWVWAFX2:1:11101:1067:4175/1
GATACGCTCAACTTGCATTATGAGATAGCCTTTTATAGTCAAATAATGATTACGACACATGGGACAATTTAGTTTAAAAAATACATTATAAAAAACCGGTTTCATTAAACGTAAATGTTGACGAATCAATTCGTTGTCGTATTTTTCACGC
+
AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEAEEEEEEA<AEEEEEEEEEEEEEEAAEE<EAEAAEEEAEEEEE<EAAAEEEEEEEEAAEAA/
@NB501457:500:HNWVWAFX2:1:11101:1071:15785/1
GTATTATTTGGAATCGGGTAGCGTGCTATGTCCGCGCGAATTCGCCATCGTTAGGTTTACGTTTAACGACATCAAAACTGTCAACGAAAGCGGTCTGTTCAATATTGTCTGTACAAATGTGAATGCGTTGACTTTAATAGAACATTTTATG
+
AAAAAEAEEAEEEEEEE/EEEEEEAEEEE/EEEEEEEEEE/<EEEEEEEEEEEEEAEEEE<E/AE/EEEEE<EEEEEEAEEEEEEEAEEEEEAEEEEEEEEEEEEEEEEEEEAEEAEE6EAE<AAEAEEEAEAEAAAAEE/AAEEEEAAEE
*Reverse Reads:*
@NB501457:500:HNWVWAFX2:1:11101:1054:14716/2
CAGCCGCAAAAAGCTACTCATCAAATCGTATGAGCAATGCGAAGACGAAGACCTGTTGATGACCGTATGCAAAAGTGTGACCCTCCAAGAGTTCTGTGCCAACGAGATAAAATCGCTGCTGGCGAAATTCCTATACGGTTTTAAAGTCTAC
+
AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEAEEEEEEEEEAAEEEEEEEAEE
@NB501457:500:HNWVWAFX2:1:11101:1065:12395/2
GGATGCTCTCGAAAGCTACCATAACTATTTCAAATTGGCCGTGCAAATGATCAAGCTCAATTACAAAAGTTGTGCTCAACGCCAGTTTAGCGATTTCGTTGTGCCGGGCGTGTTCGATCTGATCCTCGCCGATCACAGAGTTTTGAACAAC
+
AAAAAEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAEAEEEEEEEEEEEEEEEEEEAEEAEEEEEEEEEEEEAEEEEEEEEAEEAEEEEEEEEEEEE<<EE<<A/<A<AAEAAAAA<AEA<<EE<EE<EEEEE
@NB501457:500:HNWVWAFX2:1:11101:1067:4175/2
CGTTGCCGCAAACCATGACTTTAGAACAAATGAAAACGGAATTTTCCAATAAAATGGAACAACTCAATTTACGTGCGCCCCAACCAAAAAACTACGCGTACACGTTCACAACCATATGGGATTCGATTCATTTTTTGTGTTTGCTCAT
+
AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEAEAEEEEEEEEEEEEEEEEEEAEEEEEEEAEEEEEEEEEEEEAEEEEEEEEEEAEEEEEEEEEEEEEEEE/EEEAEEEEAEEEEEEAEEA<EAAEEEEAEAEEEEEEE
@NB501457:500:HNWVWAFX2:1:11101:1071:15785/2
GTTTTGCAAGTTCAAAATGATTCTCTCGTCGGCGAGTCCGTTCTTTAGAGTCATAAAATGTTCTATTAAAGTCAACGCATTCACATTTGTACAGACAATATTGAACAGACCGCTTTCGTTGACAGTTTTGATGTCGTTAAACGTAAACCT
+
AAAAAEEEEEEEEEEEEEAEAAEEEEEEAEEEEEEEEEEEEEEEAEEEAAEEEEE/EE/EEEE<EAE<EEE<EEEEAEE/AEEEA/E//EEEEEEEEEEEE<E<AEEEAEEEEE<<A<E<A<AEE/6/E<EA/<AAEEE//EEEAE/EEE
This is as per the order.
Then, where is the mistake?
I am really stuck.
…On Sun, Jul 10, 2022 at 10:24 AM Gennady Fedonin ***@***.***> wrote:
The names are expected to either end with /1 or /2 or to contain the ' '
character. The id of the pair is considered to be a substring starting from
the first character after @ till the '/' or ' '. Here are the examples of
Illumina names from Wikipedia:
***@***.***:6:73:941:1973#0/1"
***@***.***:136:FC706VJ:2:2104:15343:197393 1:Y:18:ATCACG"
***@***.*** 071112_SLXA-EAS1_s_7:5:1:817:345 length=36"
It looks like your reads were preprocessed and the names were trimmed or
even the read order was changed. In case you sure the order is correct you
can try to rename all the reads like "SampleID_1 1" ... "SampleID_10000 1"
in the first file and "SampleID_1 2" ... "SampleID_10000 2" in the second.
But are you really sure these reads you printed do form the correct pairs?
—
Reply to this email directly, view it on GitHub
<#9 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AN2V7HQEVECCCJFR66VKY4DVTJJPNANCNFSM53AF5E2Q>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
--
*Shanmugavadivel, P. S.*
*Scientist (Agricultural Biotechnology),*
*#216, Block A,*
*ICAR-Indian Institute of Pulses Research,*
*Min. of Agriculture & Farmers Welfare,*
*Govt. of India,Kanpur - 208 024.*
*email: ***@***.*** ***@***.***>*
*https://iipr.icar.gov.in/ <https://iipr.icar.gov.in/>*
|
This looks really strange: the names you've posted last time are all correct: they all end with '/1' or '/2'. Are you sure you are giving these reads to the program? It should work... May be a few of the reads still have no '/1' or '/2' at the ends? |
java -Xmx30G -jar /home/iipruser/VirGenA_v1.4/VirGenA.jar assemble -c /home/iipruser/VirGenA_v1.4/config_test_linux.xml
java.io.IOException: File /media/iipruser/shanmu_data/Sanjay_Viral_whole_genome/denovo_with_reference_alignment_27th_Nov_2021/ALL_NPV/AllNPV_samtools_reads_1.96m_reads/all_npv_samtools_R1_paired.fastq.gz have incorrect sequence identifier string
at DataReader.readFilesWithReads(DataReader.java:142)
at DataReader.readData(DataReader.java:41)
at DataReader.(DataReader.java:75)
at DataReader.getInstance(DataReader.java:102)
at KMerCounter.(KMerCounter.java:17)
at KMerCounter.getInstance(KMerCounter.java:59)
at Mapper.(Mapper.java:29)
at ConsensusBuilderSimple.(ConsensusBuilderSimple.java:23)
at ConsensusBuilderWithReassembling.(ConsensusBuilderWithReassembling.java:41)
at RefBasedAssembler.run(RefBasedAssembler.java:665)
at VirGenA.main(VirGenA.java:34)
java.lang.NullPointerException
at KMerCounter.(KMerCounter.java:40)
at KMerCounter.getInstance(KMerCounter.java:59)
at Mapper.(Mapper.java:29)
at ConsensusBuilderSimple.(ConsensusBuilderSimple.java:23)
at ConsensusBuilderWithReassembling.(ConsensusBuilderWithReassembling.java:41)
at RefBasedAssembler.run(RefBasedAssembler.java:665)
at VirGenA.main(VirGenA.java:34)
java.io.IOException: File /media/iipruser/shanmu_data/Sanjay_Viral_whole_genome/denovo_with_reference_alignment_27th_Nov_2021/ALL_NPV/AllNPV_samtools_reads_1.96m_reads/all_npv_samtools_R1_paired.fastq.gz have incorrect sequence identifier string
at DataReader.readFilesWithReads(DataReader.java:142)
at DataReader.readData(DataReader.java:41)
at DataReader.(DataReader.java:75)
at DataReader.getInstance(DataReader.java:102)
at ConsensusBuilderWithReassembling.assemble(ConsensusBuilderWithReassembling.java:762)
at RefBasedAssembler.run(RefBasedAssembler.java:666)
at VirGenA.main(VirGenA.java:34)
java.lang.NullPointerException
at ConsensusBuilderWithReassembling.assemble(ConsensusBuilderWithReassembling.java:764)
at RefBasedAssembler.run(RefBasedAssembler.java:666)
at VirGenA.main(VirGenA.java:34)
I am using same reads for denovo assembly with SPAdes and that works fine. but getting error here.
The text was updated successfully, but these errors were encountered: