forked from samtools/bcftools
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathbcftools.1
6225 lines (6177 loc) · 166 KB
/
bcftools.1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
'\" t
.\" Title: bcftools
.\" Author: [see the "AUTHOR(S)" section]
.\" Generator: Asciidoctor 2.0.15.dev
.\" Date: 2023-06-02
.\" Manual: \ \&
.\" Source: \ \&
.\" Language: English
.\"
.TH "BCFTOOLS" "1" "2023-06-02" "\ \&" "\ \&"
.ie \n(.g .ds Aq \(aq
.el .ds Aq '
.ss \n[.ss] 0
.nh
.ad l
.de URL
\fI\\$2\fP <\\$1>\\$3
..
.als MTO URL
.if \n[.g] \{\
. mso www.tmac
. am URL
. ad l
. .
. am MTO
. ad l
. .
. LINKSTYLE blue R < >
.\}
.SH "NAME"
bcftools \- utilities for variant calling and manipulating VCFs and BCFs.
.SH "SYNOPSIS"
.sp
\fBbcftools\fP [\-\-version|\-\-version\-only] [\-\-help] [\fICOMMAND\fP] [\fIOPTIONS\fP]
.SH "DESCRIPTION"
.sp
BCFtools is a set of utilities that manipulate variant calls in the Variant
Call Format (VCF) and its binary counterpart BCF. All commands work
transparently with both VCFs and BCFs, both uncompressed and BGZF\-compressed.
.sp
Most commands accept VCF, bgzipped VCF and BCF with filetype detected
automatically even when streaming from a pipe. Indexed VCF and BCF
will work in all situations. Un\-indexed VCF and BCF and streams will
work in most, but not all situations. In general, whenever multiple VCFs are
read simultaneously, they must be indexed and therefore also compressed.
(Note that files with non\-standard index names can be accessed as e.g.
"\f(CRbcftools view \-r X:2928329 file.vcf.gz##idx##non\-standard\-index\-name\fP".)
.sp
BCFtools is designed to work on a stream. It regards an input file "\-" as the
standard input (stdin) and outputs to the standard output (stdout). Several
commands can thus be combined with Unix pipes.
.SS "VERSION"
.sp
This manual page was last updated \fB2023\-06\-02 11:27 BST\fP and refers to bcftools git version \fB1.17\-52\-g0773541c+\fP.
.SS "BCF1"
.sp
The obsolete BCF1 format output by versions of samtools <= 0.1.19 is \fBnot\fP
compatible with this version of bcftools. To read BCF1 files one can use
the view command from old versions of bcftools packaged with samtools
versions <= 0.1.19 to convert to VCF, which can then be read by
this version of bcftools.
.sp
.if n .RS 4
.nf
.fam C
samtools\-0.1.19/bcftools/bcftools view file.bcf1 | bcftools view
.fam
.fi
.if n .RE
.SS "VARIANT CALLING"
.sp
See \fIbcftools call\fP for variant calling from the output of the
\fIsamtools mpileup\fP command. In versions of samtools <= 0.1.19 calling was
done with \fIbcftools view\fP. Users are now required to choose between the old
samtools calling model (\fI\-c/\-\-consensus\-caller\fP) and the new multiallelic
calling model (\fI\-m/\-\-multiallelic\-caller\fP). The multiallelic calling model
is recommended for most tasks.
.SS "FILTERING EXPRESSIONS"
.sp
See \fBEXPRESSIONS\fP
.SH "LIST OF COMMANDS"
.sp
For a full list of available commands, run \fBbcftools\fP without arguments. For a full
list of available options, run \fBbcftools\fP \fICOMMAND\fP without arguments.
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
. sp -1
. IP \(bu 2.3
.\}
\fBannotate\fP .. edit VCF files, add or remove annotations
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
. sp -1
. IP \(bu 2.3
.\}
\fBcall\fP .. SNP/indel calling (former "view")
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
. sp -1
. IP \(bu 2.3
.\}
\fBcnv\fP .. Copy Number Variation caller
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
. sp -1
. IP \(bu 2.3
.\}
\fBconcat\fP .. concatenate VCF/BCF files from the same set of samples
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
. sp -1
. IP \(bu 2.3
.\}
\fBconsensus\fP .. create consensus sequence by applying VCF variants
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
. sp -1
. IP \(bu 2.3
.\}
\fBconvert\fP .. convert VCF/BCF to other formats and back
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
. sp -1
. IP \(bu 2.3
.\}
\fBcsq\fP .. haplotype aware consequence caller
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
. sp -1
. IP \(bu 2.3
.\}
\fBfilter\fP .. filter VCF/BCF files using fixed thresholds
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
. sp -1
. IP \(bu 2.3
.\}
\fBgtcheck\fP .. check sample concordance, detect sample swaps and contamination
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
. sp -1
. IP \(bu 2.3
.\}
\fBhead\fP .. view VCF/BCF file headers
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
. sp -1
. IP \(bu 2.3
.\}
\fBindex\fP .. index VCF/BCF
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
. sp -1
. IP \(bu 2.3
.\}
\fBisec\fP .. intersections of VCF/BCF files
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
. sp -1
. IP \(bu 2.3
.\}
\fBmerge\fP .. merge VCF/BCF files files from non\-overlapping sample sets
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
. sp -1
. IP \(bu 2.3
.\}
\fBmpileup\fP .. multi\-way pileup producing genotype likelihoods
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
. sp -1
. IP \(bu 2.3
.\}
\fBnorm\fP .. normalize indels
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
. sp -1
. IP \(bu 2.3
.\}
\fBplugin\fP .. run user\-defined plugin
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
. sp -1
. IP \(bu 2.3
.\}
\fBpolysomy\fP .. detect contaminations and whole\-chromosome aberrations
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
. sp -1
. IP \(bu 2.3
.\}
\fBquery\fP .. transform VCF/BCF into user\-defined formats
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
. sp -1
. IP \(bu 2.3
.\}
\fBreheader\fP .. modify VCF/BCF header, change sample names
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
. sp -1
. IP \(bu 2.3
.\}
\fBroh\fP .. identify runs of homo/auto\-zygosity
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
. sp -1
. IP \(bu 2.3
.\}
\fBsort\fP .. sort VCF/BCF files
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
. sp -1
. IP \(bu 2.3
.\}
\fBstats\fP .. produce VCF/BCF stats (former vcfcheck)
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
. sp -1
. IP \(bu 2.3
.\}
\fBview\fP .. subset, filter and convert VCF and BCF files
.RE
.SH "LIST OF SCRIPTS"
.sp
Some helper scripts are bundled with the bcftools code.
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
. sp -1
. IP \(bu 2.3
.\}
\fBgff2gff\fP .. converts a GFF file to the format required by \fBcsq\fP
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
. sp -1
. IP \(bu 2.3
.\}
\fBplot\-vcfstats\fP .. plots the output of \fBstats\fP
.RE
.SH "COMMANDS AND OPTIONS"
.SS "Common Options"
.sp
The following options are common to many bcftools commands. See usage for
specific commands to see if they apply.
.sp
\fIFILE\fP
.RS 4
Files can be both VCF or BCF, uncompressed or BGZF\-compressed. The file "\-"
is interpreted as standard input. Some tools may require tabix\- or
CSI\-indexed files.
.RE
.sp
\fB\-c, \-\-collapse\fP \fIsnps\fP|\fIindels\fP|\fIboth\fP|\fIall\fP|\fIsome\fP|\fInone\fP|\fIid\fP
.RS 4
Controls how to treat records with duplicate positions and defines compatible
records across multiple input files. Here by "compatible" we mean records which
should be considered as identical by the tools. For example, when performing
line intersections, the desire may be to consider as identical all sites with
matching positions (\fBbcftools isec \-c\fP \fIall\fP), or only sites with matching variant
type (\fBbcftools isec \-c\fP \fIsnps\fP\~ \fB\-c\fP \fIindels\fP), or only sites with all alleles
identical (\fBbcftools isec \-c\fP \fInone\fP).
.sp
\fInone\fP
.RS 4
only records with identical REF and ALT alleles are compatible
.RE
.sp
\fIsome\fP
.RS 4
only records where some subset of ALT alleles match are compatible
.RE
.sp
\fIall\fP
.RS 4
all records are compatible, regardless of whether the ALT alleles
match or not. In the case of records with the same position, only
the first will be considered and appear on output.
.RE
.sp
\fIsnps\fP
.RS 4
any SNP records are compatible, regardless of whether the ALT
alleles match or not. For duplicate positions, only the first SNP
record will be considered and appear on output.
.RE
.sp
\fIindels\fP
.RS 4
all indel records are compatible, regardless of whether the REF
and ALT alleles match or not. For duplicate positions, only the
first indel record will be considered and appear on output.
.RE
.sp
\fIboth\fP
.RS 4
abbreviation of "\fB\-c\fP \fIindels\fP\~ \fB\-c\fP \fIsnps\fP"
.RE
.sp
\fIid\fP
.RS 4
only records with identical ID column are compatible.
Supported by \fBbcftools merge\fP only.
.RE
.RE
.sp
\fB\-f, \-\-apply\-filters\fP \fILIST\fP
.RS 4
Skip sites where FILTER column does not contain any of the strings listed
in \fILIST\fP. For example, to include only sites which have no filters set,
use \fB\-f\fP \fI.,PASS\fP.
.RE
.sp
\fB\-\-no\-version\fP
.RS 4
Do not append version and command line information to the output VCF header.
.RE
.sp
\fB\-o, \-\-output\fP \fIFILE\fP
.RS 4
When output consists of a single stream, write it to \fIFILE\fP rather than
to standard output, where it is written by default.
The file type is determined automatically from the file name suffix and in
case a conflicting \fB\-O\fP option is given, the file name suffix takes precedence.
.RE
.sp
\fB\-O, \-\-output\-type\fP \fIb\fP|\fIu\fP|\fIz\fP|\fIv\fP[0\-9]
.RS 4
Output compressed BCF (\fIb\fP), uncompressed BCF (\fIu\fP), compressed VCF (\fIz\fP), uncompressed VCF (\fIv\fP).
Use the \-Ou option when piping between bcftools subcommands to speed up
performance by removing unnecessary compression/decompression and
VCF\(<-\(->BCF conversion.
\~
The compression level of the compressed formats (\fIb\fP and \fIz\fP) can be set by
by appending a number between 0\-9.
.RE
.sp
\fB\-r, \-\-regions\fP \fIchr\fP|\fIchr:pos\fP|\fIchr:beg\-end\fP|\fIchr:beg\-\fP[,...]
.RS 4
Comma\-separated list of regions, see also \fB\-R, \-\-regions\-file\fP. Overlapping
records are matched even when the starting coordinate is outside of the
region, unlike the \fB\-t/\-T\fP options where only the POS coordinate is checked.
Note that \fB\-r\fP cannot be used in combination with \fB\-R\fP.
.RE
.sp
\fB\-R, \-\-regions\-file\fP \fIFILE\fP
.RS 4
Regions can be specified either on command line or in a VCF, BED, or
tab\-delimited file (the default). The columns of the tab\-delimited file
can contain either positions (two\-column format: CHROM, POS) or intervals
(three\-column format: CHROM, BEG, END), but not both. Positions are 1\-based
and inclusive. The columns of the tab\-delimited BED file are also
CHROM, POS and END (trailing columns are ignored), but coordinates
are 0\-based, half\-open. To indicate that a file be treated as BED rather
than the 1\-based tab\-delimited file, the file must have the ".bed" or
".bed.gz" suffix (case\-insensitive). Uncompressed files are stored in
memory, while bgzip\-compressed and tabix\-indexed region files are streamed.
Note that sequence names must match exactly, "chr20" is not the same as
"20". Also note that chromosome ordering in \fIFILE\fP will be respected,
the VCF will be processed in the order in which chromosomes first appear
in \fIFILE\fP. However, within chromosomes, the VCF will always be
processed in ascending genomic coordinate order no matter what order they
appear in \fIFILE\fP. Note that overlapping regions in \fIFILE\fP can result in
duplicated out of order positions in the output.
This option requires indexed VCF/BCF files. Note that \fB\-R\fP cannot be used
in combination with \fB\-r\fP.
.RE
.sp
\fB\-\-regions\-overlap\fP \fIpos\fP|\fIrecord\fP|\fIvariant\fP|\fI0\fP|\fI1\fP|\fI2\fP
.RS 4
This option controls how overlapping records are determined:
set to \fBpos\fP or \fB0\fP if the VCF record has to have POS inside a region
(this corresponds to the default behavior of \fB\-t/\-T\fP);
set to \fBrecord\fP or \fB1\fP if also overlapping records with POS outside a region
should be included (this is the default behavior of \fB\-r/\-R\fP, and includes indels
with POS at the end of a region, which are technically outside the region); or set
to \fBvariant\fP or \fB2\fP to include only true overlapping variation (compare
the full VCF representation "\f(CRTA>T\-\fP" vs the true sequence variation "\f(CRA>\-\fP").
.RE
.sp
\fB\-s, \-\-samples\fP [^]\fILIST\fP
.RS 4
Comma\-separated list of samples to include or exclude if prefixed
with "^." (Note that when multiple samples are to be excluded,
the "^" prefix is still present only once, e.g. "^SAMPLE1,SAMPLE2".)
The sample order is updated to reflect that given on the command line.
Note that in general tags such as INFO/AC, INFO/AN, etc are not updated
to correspond to the subset samples. \fBbcftools view\fP is the
exception where some tags will be updated (unless the \fB\-I, \-\-no\-update\fP
option is used; see \fBbcftools view\fP documentation). To use updated
tags for the subset in another command one can pipe from \fBview\fP into
that command. For example:
.RE
.sp
.if n .RS 4
.nf
.fam C
bcftools view \-Ou \-s sample1,sample2 file.vcf | bcftools query \-f %INFO/AC\(rst%INFO/AN\(rsn
.fam
.fi
.if n .RE
.sp
\fB\-S, \-\-samples\-file\fP [^]\fIFILE\fP
.RS 4
File of sample names to include or exclude if prefixed with "^".
One sample per line. See also the note above for the \fB\-s, \-\-samples\fP
option.
The sample order is updated to reflect that given in the input file.
The command \fBbcftools call\fP accepts an optional second
column indicating ploidy (0, 1 or 2) or sex (as defined by
\fB\-\-ploidy\fP, for example "F" or "M"), for example:
.RE
.sp
.if n .RS 4
.nf
.fam C
sample1 1
sample2 2
sample3 2
.fam
.fi
.if n .RE
.sp
or
.sp
.if n .RS 4
.nf
.fam C
sample1 M
sample2 F
sample3 F
.fam
.fi
.if n .RE
.sp
If the second column is not present, the sex "F" is assumed.
With \fBbcftools call \-C\fP \fItrio\fP, PED file is expected.
The program ignores the first column and the last indicates sex (1=male, 2=female), for example:
.sp
.if n .RS 4
.nf
.fam C
ignored_column daughterA fatherA motherA 2
ignored_column sonB fatherB motherB 1
.fam
.fi
.if n .RE
.sp
\fB\-t, \-\-targets\fP [^]\fIchr\fP|\fIchr:pos\fP|\fIchr:from\-to\fP|\fIchr:from\-\fP[,...]
.RS 4
Similar as \fB\-r, \-\-regions\fP, but the next position is accessed by streaming the
whole VCF/BCF rather than using the tbi/csi index. Both \fB\-r\fP and \fB\-t\fP options
can be applied simultaneously: \fB\-r\fP uses the index to jump to a region
and \fB\-t\fP discards positions which are not in the targets. Unlike \fB\-r\fP, targets
can be prefixed with "^" to request logical complement. For example, "^X,Y,MT"
indicates that sequences X, Y and MT should be skipped.
Yet another difference between the \fB\-t/\-T\fP and \fB\-r/\-R\fP is that \fB\-r/\-R\fP checks for
proper overlaps and considers both POS and the end position of an indel, while \fB\-t/\-T\fP
considers the POS coordinate only (by default; see also \fB\-\-regions\-overlap\fP and \fB\-\-targets\-overlap\fP).
Note that \fB\-t\fP cannot be used in combination with \fB\-T\fP.
.RE
.sp
\fB\-T, \-\-targets\-file\fP [^]\fIFILE\fP
.RS 4
Same \fB\-t, \-\-targets\fP, but reads regions from a file. Note that \fB\-T\fP
cannot be used in combination with \fB\-t\fP.
.sp
With the \fBcall \-C\fP \fIalleles\fP command, third column of the targets file must
be comma\-separated list of alleles, starting with the reference allele.
Note that the file must be compressed and indexed.
Such a file can be easily created from a VCF using:
.RE
.sp
.if n .RS 4
.nf
.fam C
bcftools query \-f\(aq%CHROM\(rst%POS\(rst%REF,%ALT\(rsn\(aq file.vcf | bgzip \-c > als.tsv.gz && tabix \-s1 \-b2 \-e2 als.tsv.gz
.fam
.fi
.if n .RE
.sp
\fB\-\-targets\-overlap\fP \fIpos\fP|\fIrecord\fP|\fIvariant\fP|\fI0\fP|\fI1\fP|\fI2\fP
.RS 4
Same as \fB\-\-regions\-overlap\fP but for \fB\-t/\-T\fP.
.RE
.sp
\fB\-\-threads\fP \fIINT\fP
.RS 4
Use multithreading with \fIINT\fP worker threads. The option is currently used only for the compression of the
output stream, only when \fI\-\-output\-type\fP is \fIb\fP or \fIz\fP. Default: 0.
.RE
.sp
\fB\-\-write\-index\fP
.RS 4
Automatically index the output files. Can be used only for compressed BCF and VCF output.
.RE
.SS "bcftools annotate \fI[OPTIONS]\fP \fIFILE\fP"
.sp
Add or remove annotations.
.sp
\fB\-a, \-\-annotations\fP \fIfile\fP
.RS 4
Bgzip\-compressed and tabix\-indexed file with annotations. The file
can be VCF, BED, or a tab\-delimited file with mandatory columns CHROM, POS
(or, alternatively, FROM and TO), optional columns REF and ALT, and arbitrary
number of annotation columns. BED files are expected to have
the ".bed" or ".bed.gz" suffix (case\-insensitive), otherwise a tab\-delimited file is assumed.
Note that in case of tab\-delimited file, the coordinates POS, FROM and TO are
one\-based and inclusive. When REF and ALT are present, only matching VCF
records will be annotated. If the END coordinate is present in the annotation file
and given on command line as "\f(CR\-c ~INFO/END\fP", then VCF records will be matched also by the INFO/END coordinate.
If ID is present in the annotation file and given as "\f(CR\-c ~ID\fP", then VCF records will be matched
also by the ID column.
\~
.br
\~
.br
When multiple ALT alleles are present in the annotation file (given as
comma\-separated list of alleles), at least one must match one of the
alleles in the corresponding VCF record. Similarly, at least one
alternate allele from a multi\-allelic VCF record must be present in the
annotation file.
\~
.br
\~
.br
Missing values can be added by providing "." in place of actual value
and using the missing value modifier with \fB\-c\fP, such as ".TAG".
\~
.br
\~
.br
Note that flag types, such as "INFO/FLAG", can be annotated by including
a field with the value "1" to set the flag, "0" to remove it, or "." to
keep existing flags.
See also \fB\-c, \-\-columns\fP and \fB\-h, \-\-header\-lines\fP.
.RE
.sp
.if n .RS 4
.nf
.fam C
# Sample annotation file with columns CHROM, POS, STRING_TAG, NUMERIC_TAG
1 752566 SomeString 5
1 798959 SomeOtherString 6
.fam
.fi
.if n .RE
.sp
\fB\-c, \-\-columns\fP \fIlist\fP
.RS 4
Comma\-separated list of columns or tags to carry over from the annotation file
(see also \fB\-a, \-\-annotations\fP). If the annotation file is not a VCF/BCF,
\fIlist\fP describes the columns of the annotation file and must include CHROM,
POS (or, alternatively, FROM and TO), and optionally REF and ALT. Unused
columns which should be ignored can be indicated by "\-".
\~
.br
\~
.br
If the annotation file is a VCF/BCF, only the edited columns/tags must be present and their
order does not matter. The columns ID, QUAL, FILTER, INFO and FORMAT
can be edited, where INFO tags can be written both as "INFO/TAG" or simply "TAG",
and FORMAT tags can be written as "FORMAT/TAG" or "FMT/TAG".
The imported VCF annotations can be renamed as "DST_TAG:=SRC_TAG" or "FMT/DST_TAG:=FMT/SRC_TAG".
\~
.br
\~
.br
To carry over all INFO annotations, use "INFO". To add all INFO annotations except
"TAG", use "^INFO/TAG". By default, existing values are replaced.
\~
.br
\~
.br
By default, existing tags are overwritten unless the source value is a missing value (i.e. ".").
If also missing values should be carried over (and overwrite existing tags), use ".TAG" instead of "TAG".
To add annotations without overwriting existing values (that is, to add tags that are absent or
to add values to existing tags with missing values), use "+TAG" instead of "TAG". These can be combined,
for example ".+TAG" can be used to add TAG even if the source value is missing but only if TAG does not
exist in the target file; existing tags will not be overwritten.
To append to existing values (rather than replacing or leaving untouched), use "=TAG"
(instead of "TAG" or "+TAG").
To replace only existing values without modifying missing annotations, use "\-TAG".
To match the record also by ID or INFO/END, in addition to REF and ALT, use "~ID" or "~INFO/END".
If position needs to be replaced, mark the column with the new position as "~POS".
\~
.br
\~
.br
If the annotation file is not a VCF/BCF, all new annotations must be
defined via \fB\-h, \-\-header\-lines\fP.
\~
.br
\~
.br
See also the \fB\-l, \-\-merge\-logic\fP option.
.RE
.sp
\fB\-C, \-\-columns\-file\fP \fIfile\fP
.RS 4
Read the list of columns from a file (normally given via the \fB\-c, \-\-columns\fP option).
"\-" to skip a column of the annotation file.
One column name per row, an additional space\- or tab\-separated field can
be present to indicate the merge logic (normally given via the \fB\-l, \-\-merge\-logic\fP option).
This is useful when many annotations are added at once.
.RE
.sp
\fB\-e, \-\-exclude\fP \fIEXPRESSION\fP
.RS 4
exclude sites for which \fIEXPRESSION\fP is true. For valid expressions see
\fBEXPRESSIONS\fP.
.RE
.sp
\fB\-\-force\fP
.RS 4
continue even when parsing errors, such as undefined tags, are encountered. Note
this can be an unsafe operation and can result in corrupted BCF files. If this
option is used, make sure to sanity check the result thoroughly.
.RE
.sp
\fB\-h, \-\-header\-lines\fP \fIfile\fP
.RS 4
Lines to append to the VCF header, see also \fB\-c, \-\-columns\fP and \fB\-a, \-\-annotations\fP. For example:
.RE
.sp
.if n .RS 4
.nf
.fam C
##INFO=<ID=NUMERIC_TAG,Number=1,Type=Integer,Description="Example header line">
##INFO=<ID=STRING_TAG,Number=1,Type=String,Description="Yet another header line">
.fam
.fi
.if n .RE
.sp
\fB\-I, \-\-set\-id\fP [+]\fIFORMAT\fP
.RS 4
assign ID on the fly. The format is the same as in the \fBquery\fP
command (see below). By default all existing IDs are replaced. If the
format string is preceded by "+", only missing IDs will be set. For example,
one can use
.RE
.sp
.if n .RS 4
.nf
.fam C
bcftools annotate \-\-set\-id +\(aq%CHROM\(rs_%POS\(rs_%REF\(rs_%FIRST_ALT\(aq file.vcf
.fam
.fi
.if n .RE
.sp
\fB\-i, \-\-include\fP \fIEXPRESSION\fP
.RS 4
include only sites for which \fIEXPRESSION\fP is true. For valid expressions see
\fBEXPRESSIONS\fP.
.RE
.sp
\fB\-k, \-\-keep\-sites\fP
.RS 4
keep sites which do not pass \fB\-i\fP and \fB\-e\fP expressions instead of discarding them
.RE
.sp
\fB\-l, \-\-merge\-logic\fP \fItag:first\fP|\fIappend\fP|\fIappend\-missing\fP|\fIunique\fP|\fIsum\fP|\fIavg\fP|\fImin\fP|\fImax\fP[,...]
.RS 4
When multiple regions overlap a single record, this option defines how to treat multiple
annotation values when setting \fItag\fP in the destination file: use the first encountered value ignoring
the rest (\fIfirst\fP); append allowing duplicates (\fIappend\fP); append even if the appended value is missing,
i.e. is a dot (\fIappend\-missing\fP); append discarding duplicate values (\fIunique\fP);
sum the values (\fIsum\fP, numeric fields only); average the values (\fIavg\fP); use the minimum value (\fImin\fP) or
the maximum (\fImax\fP).
+
Note that this option is intended for use with BED or TAB\-delimited annotation files only. Moreover,
it is effective only when either \fIREF\fP and \fIALT\fP or \fIBEG\fP and \fIEND\fP \fB\-\-columns\fP are present .
+
Multiple rules can be given either as a comma\-separated list or giving the option multiple times.
This is an experimental feature.
.RE
.sp
\fB\-m, \-\-mark\-sites\fP \fITAG\fP
.RS 4
annotate sites which are present ("+") or absent ("\-") in the \fB\-a\fP file with a new INFO/TAG flag
.RE
.sp
\fB\-\-min\-overlap\fP \fIANN\fP:\(aqVCF\(aq
.RS 4
minimum overlap required as a fraction of the variant in the annotation \fB\-a\fP file (\fIANN\fP), in the
target VCF file (\fI:VCF\fP), or both for reciprocal overlap (\fIANN:VCF\fP).
By default overlaps of arbitrary length are sufficient.
The option can be used only with the tab\-delimited annotation \fB\-a\fP file and with \fIBEG\fP and \fIEND\fP
columns present.
.RE
.sp
\fB\-\-no\-version\fP
.RS 4
see \fBCommon Options\fP
.RE
.sp
\fB\-o, \-\-output\fP \fIFILE\fP
.RS 4
see \fBCommon Options\fP
.RE
.sp
\fB\-O, \-\-output\-type\fP \fIb\fP|\fIu\fP|\fIz\fP|\fIv\fP[0\-9]
.RS 4
see \fBCommon Options\fP
.RE
.sp
\fB\-\-pair\-logic\fP \fIsnps\fP|\fIindels\fP|\fIboth\fP|\fIall\fP|\fIsome\fP|\fIexact\fP
.RS 4
Controls how to match records from the annotation file to the target VCF.
Effective only when \fB\-a\fP is a VCF or BCF. The option replaces the former
uninuitive \fB\-\-collapse\fP.
See \fBCommon Options\fP for more.
.RE
.sp
\fB\-r, \-\-regions\fP \fIchr\fP|\fIchr:pos\fP|\fIchr:from\-to\fP|\fIchr:from\-\fP[,...]
.RS 4
see \fBCommon Options\fP
.RE
.sp
\fB\-R, \-\-regions\-file\fP \fIfile\fP
.RS 4
see \fBCommon Options\fP
.RE
.sp
\fB\-\-regions\-overlap\fP \fI0\fP|\fI1\fP|\fI2\fP
.RS 4
see \fBCommon Options\fP
.RE
.sp
\fB\-\-rename\-annots\fP \fIfile\fP
.RS 4
rename annotations according to the map in \fIfile\fP, with
"old_name new_name\(rsn" pairs separated by whitespaces, each on a separate
line. The old name must be prefixed with the annotation type:
INFO, FORMAT, or FILTER.
.RE
.sp
\fB\-\-rename\-chrs\fP \fIfile\fP
.RS 4
rename chromosomes according to the map in \fIfile\fP, with
"old_name new_name\(rsn" pairs separated by whitespaces, each on a separate
line.
.RE
.sp
\fB\-s, \-\-samples\fP [^]\fILIST\fP
.RS 4
subset of samples to annotate, see also \fBCommon Options\fP
.RE
.sp
\fB\-S, \-\-samples\-file\fP \fIFILE\fP
.RS 4
subset of samples to annotate. If the samples are named differently in the
target VCF and the \fB\-a, \-\-annotations\fP VCF, the name mapping can be
given as "src_name dst_name\(rsn", separated by whitespaces, each pair on a
separate line.
.RE
.sp
\fB\-\-single\-overlaps\fP
.RS 4
use this option to keep memory requirements low with very large annotation
files. Note, however, that this comes at a cost, only single overlapping intervals
are considered in this mode. This was the default mode until the commit
af6f0c9 (Feb 24 2019).
.RE
.sp
\fB\-\-threads\fP \fIINT\fP
.RS 4
see \fBCommon Options\fP
.RE
.sp
\fB\-x, \-\-remove\fP \fIlist\fP
.RS 4
List of annotations to remove. Use "FILTER" to remove all filters or
"FILTER/SomeFilter" to remove a specific filter. Similarly, "INFO" can
be used to remove all INFO tags and "FORMAT" to remove all FORMAT tags
except GT. To remove all INFO tags except "FOO" and "BAR", use
"^INFO/FOO,INFO/BAR" (and similarly for FORMAT and FILTER).
"INFO" can be abbreviated to "INF" and "FORMAT" to "FMT".
.RE
.sp
\fB\-\-write\-index\fP
.RS 4
Automatically index the output file
.RE
.sp
\fBExamples:\fP
.sp
.if n .RS 4
.nf
.fam C
# Remove three fields
bcftools annotate \-x ID,INFO/DP,FORMAT/DP file.vcf.gz
# Remove all INFO fields and all FORMAT fields except for GT and PL
bcftools annotate \-x INFO,^FORMAT/GT,FORMAT/PL file.vcf
# Add ID, QUAL and INFO/TAG, not replacing TAG if already present
bcftools annotate \-a src.bcf \-c ID,QUAL,+TAG dst.bcf
# Carry over all INFO and FORMAT annotations except FORMAT/GT
bcftools annotate \-a src.bcf \-c INFO,^FORMAT/GT dst.bcf
# Annotate from a tab\-delimited file with six columns (the fifth is ignored),
# first indexing with tabix. The coordinates are 1\-based.
tabix \-s1 \-b2 \-e2 annots.tab.gz
bcftools annotate \-a annots.tab.gz \-h annots.hdr \-c CHROM,POS,REF,ALT,\-,TAG file.vcf
# Annotate from a tab\-delimited file with regions (1\-based coordinates, inclusive)
tabix \-s1 \-b2 \-e3 annots.tab.gz
bcftools annotate \-a annots.tab.gz \-h annots.hdr \-c CHROM,FROM,TO,TAG input.vcf
# Annotate from a bed file (0\-based coordinates, half\-closed, half\-open intervals)
bcftools annotate \-a annots.bed.gz \-h annots.hdr \-c CHROM,FROM,TO,TAG input.vcf
# Transfer the INFO/END tag, matching by POS,REF,ALT and ID. This example assumes
# that INFO/END is already present in the VCF header.
bcftools annotate \-a annots.tab.gz \-c CHROM,POS,~ID,REF,ALT,INFO/END input.vcf
# For more examples see http://samtools.github.io/bcftools/howtos/annotate.html
.fam
.fi
.if n .RE
.SS "bcftools call \fI[OPTIONS]\fP \fIFILE\fP"
.sp
This command replaces the former \fBbcftools view\fP caller. Some of the original
functionality has been temporarily lost in the process of transition under
.URL "http://github.com/samtools/htslib" "htslib" ","
but will be added back on popular
demand. The original calling model can be invoked with the \fB\-c\fP option.
.SS "File format options:"
.sp
\fB\-\-no\-version\fP
.RS 4
see \fBCommon Options\fP
.RE
.sp
\fB\-o, \-\-output\fP \fIFILE\fP
.RS 4
see \fBCommon Options\fP
.RE
.sp
\fB\-O, \-\-output\-type\fP \fIb\fP|\fIu\fP|\fIz\fP|\fIv\fP[0\-9]
.RS 4
see \fBCommon Options\fP
.RE
.sp
\fB\-\-ploidy\fP \fIASSEMBLY\fP[\fI?\fP]
.RS 4
predefined ploidy, use \fIlist\fP (or any other unused word) to print a list
of all predefined assemblies. Append a question mark to print the actual
definition. See also \fB\-\-ploidy\-file\fP.
.RE
.sp
\fB\-\-ploidy\-file\fP \fIFILE\fP
.RS 4
ploidy definition given as a space/tab\-delimited list of
CHROM, FROM, TO, SEX, PLOIDY. The SEX codes are arbitrary and
correspond to the ones used by \fB\-\-samples\-file\fP.
The default ploidy can be given using the starred records (see
below), unlisted regions have ploidy 2. The default ploidy definition is
.RE
.sp
.if n .RS 4
.nf
.fam C
X 1 60000 M 1
X 2699521 154931043 M 1
Y 1 59373566 M 1
Y 1 59373566 F 0
MT 1 16569 M 1
MT 1 16569 F 1
* * * M 2
* * * F 2
.fam
.fi
.if n .RE
.sp
\fB\-r, \-\-regions\fP \fIchr\fP|\fIchr:pos\fP|\fIchr:from\-to\fP|\fIchr:from\-\fP[,...]