forked from GMOD/jbrowse
-
Notifications
You must be signed in to change notification settings - Fork 3
/
flatfile-to-json.pl
executable file
·159 lines (98 loc) · 4.59 KB
/
flatfile-to-json.pl
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
#!/usr/bin/env perl
use strict;
use FindBin qw($RealBin);
use lib "$RealBin/../src/perl5";
use JBlibs;
use Bio::JBrowse::Cmd::FlatFileToJson;
exit Bio::JBrowse::Cmd::FlatFileToJson->new(@ARGV)->run;
__END__
=head1 NAME
flatfile-to-json.pl - format data into JBrowse JSON format from an annotation file
=head1 USAGE
flatfile-to-json.pl \
( --gff <GFF3 file> | --bed <BED file> | --gbk <GenBank file> ) \
--trackLabel <track identifier> \
[ --trackType <JS Class> ] \
[ --out <output directory> ] \
[ --key <human-readable track name> ] \
[ --className <CSS class name for displaying features> ] \
[ --urltemplate "http://example.com/idlookup?id={id}" ] \
[ --arrowheadClass <CSS class> ] \
[ --noSubfeatures ] \
[ --subfeatureClasses '{ JSON-format subfeature class map }' ] \
[ --clientConfig '{ JSON-format extra configuration for this track }' ] \
[ --thinType <BAM -thin_type> ] \
[ --thicktype <BAM -thick_type>] \
[ --type <feature types to process> ] \
[ --nclChunk <chunk size for generated NCLs> ] \
[ --compress ] \
[ --sortMem <memory in bytes to use for sorting> ] \
=head1 ARGUMENTS
=head2 Required
=over 4
=item --gff <GFF3 file>
=item --bed <BED file>
=item --gbk <GenBank file>
Process a GFF3, BED, or GenBank file containing annotation data.
NOTE: This script does not support GFF version 2 or GTF (GFF 2.5) input.
=item --trackLabel <track identifier>
Unique identifier for this track. Required.
=back
=head2 Optional
=over 4
=item --help | -h | -?
Display an extended help screen.
=item --key '<text>'
Human-readable track name.
=item --out <output directory>
Output directory to write to. Defaults to "data/".
=item --trackType JBrowse/View/Track/HTMLFeatures
Optional JavaScript class to use to display this track. Defaults to
JBrowse/View/Track/HTMLFeatures.
=item --className <CSS class name for displaying features>
CSS class for features. Defaults to "feature".
=item --urltemplate "http://example.com/idlookup?id={id}"
Template for a URL to be visited when features are clicked on.
=item --noSubfeatures
Do not format subfeature data.
=item --arrowheadClass <CSS class>
CSS class for arrowheads.
=item --subfeatureClasses '{ JSON-format subfeature class map }'
CSS classes for each subfeature type, in JSON syntax. Example:
--subfeatureClasses '{"CDS": "transcript-CDS", "exon": "transcript-exon"}'
=item --clientConfig '{ JSON-format extra configuration for this track }'
Extra configuration for the client, in JSON syntax. Example:
--clientConfig '{"featureCss": "background-color: #668; height: 8px;", "histScale": 2}'
=item --type <feature types to process>
Only process features of the given type. Can take either single type
names, e.g. "mRNA", or type names qualified by "source" name, for
whatever definition of "source" your data file might have. For
example, "mRNA:exonerate" will filter for only mRNA features that have
a source of "exonerate".
Multiple type names can be specified by separating the type names with
commas, e.g. C<--type mRNA:exonerate,ncRNA>.
=item --nclChunk <chunk size for generated NCLs>
NCList chunk size; if you get "json text or perl structure exceeds
maximum nesting level" errors, try setting this lower (default:
50,000).
=item --compress
Compress the output, making .jsonz (gzipped) JSON files. This can
save a lot of disk space, but note that web servers require some
additional configuration to serve these correctly.
=item --sortMem <bytes>
Bytes of RAM to use for sorting features. Default 512MB.
=back
=head2 BED-specific
=over 4
=item --thinType <type>
=item --thickType <type>
Correspond to C<<-thin_type>> and C<<-thick_type>> in
L<Bio::FeatureIO::bed>. Do C<<perldoc Bio::FeatureIO::bed>> for
details.
=back
=head1 MEMORY USAGE
For efficient memory usage, it is very important that large GFF3 files
have C<###> lines in them periodically. For details of what C<###> is
and how it is used, see the GFF3 specification at
L<http://www.sequenceontology.org/gff3.shtml>.
=cut